Thursday, January 16th 2025

SK hynix Ships HBM4 Samples to NVIDIA in June, Mass Production Slated for Q3 2025

SK hynix has sped up its HBM4 development plans, according to a report from ZDNet. The company wants to start shipping HBM4 samples to NVIDIA this June, which is earlier than the original timeline. SK hynix hopes to start supplying products by the end of Q3 2025, this push likely aims to get a head start in the next-gen HBM market. To meet this sped-up schedule, SK hynix has set up a special HBM4 development team to supply NVIDIA. Industry sources indicated on January 15th that SK Hynix plans to deliver its first customer samples of HBM4 in early June this year. The company hit a big milestone when it wrapped up the HBM4 tapeout in Q4 2024, the last design step.

HBM4 marks the sixth iteration of high-bandwidth memory tech using stacked DRAM architecture. It comes after HBM3E, the current fifth-gen version, with large-scale production likely to kick off in late 2025 at the earliest. HBM4 boasts a big leap forward doubling data transfer ability with 2,048 I/O channels up from its forerunner. NVIDIA planned to use 12-layer stacked HBM4 in its 2026 "Rubin" line of powerful GPUs. However, NVIDIA has moved up its timeline for "Rubin" aiming to launch in late 2025.
A source familiar with the matter explained, "It seems that NVIDIA's will to launch Rubin early is stronger than expected, to the point that it is pushing forward trial production to the second half of this year." He added, "In line with this, memory companies such as SK hynix are also pushing for early supply of samples. Product supply could be possible as early as the end of the third quarter."
Source: ZDNet
Add your own comment

13 Comments on SK hynix Ships HBM4 Samples to NVIDIA in June, Mass Production Slated for Q3 2025

#1
Thunder
Is there a possibility that we will see HBM in Desktop GPUs one day?
Posted on Reply
#2
Chaitanya
For the fully unlocked Blackwell dies then.
Posted on Reply
#3
AnotherReader
ThunderIs there a possibility that we will see HBM in Desktop GPUs one day?
It's very unlikely; HBM is far too expensive for anything short of a 5090.
Posted on Reply
#4
Nomad76
News Editor
ThunderIs there a possibility that we will see HBM in Desktop GPUs one day?
One day, yes. However, that day is far away in the future. Technically not a problem, just a matter of costs as @AnotherReader pointed out.
Posted on Reply
#5
igormp
ThunderIs there a possibility that we will see HBM in Desktop GPUs one day?
Apart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
Posted on Reply
#6
unwind-protect
I'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.
Posted on Reply
#7
igormp
unwind-protectI'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.
There's that Xeon Max with HBM on board, or you could try to get your hands in one those MI300A from MS.
Posted on Reply
#8
AnotherReader
igormpApart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
The cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales. A bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.

As for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power. Alternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.
Posted on Reply
#9
Philaphlous
It'll be the day when laptops receive HBM.... so much space savings..
Posted on Reply
#10
Tomorrow
ThunderIs there a possibility that we will see HBM in Desktop GPUs one day?
If by one day you mean one day again, then yes. Absolutely.
AnotherReaderIt's very unlikely; HBM is far too expensive for anything short of a 5090.
Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.
igormpThe 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).
That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.
igormpTo reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.
AnotherReaderThe cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales.
My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.
AnotherReaderA bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.
That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.
AnotherReaderAs for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power.
The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.
AnotherReaderAlternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.
And HBM3e is cheaper as it's not the latest and greatest.
Posted on Reply
#11
AnotherReader
TomorrowIf by one day you mean one day again, then yes. Absolutely.

Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.

That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.

Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.

My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.

That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.

The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.

And HBM3e is cheaper as it's not the latest and greatest.
Even HBM2 has advantages over GDDR7, but I was thinking HBM3. As for the capacity, there are 8 high stacks of HBM4 which would lead to 32 GB capacity at about 89% of the bandwidth of the existing 512-bit wide GDDR7 memory interface. Given that the gap in graphics performance and even compute performance between the 5090 and the 4090 is far less than the difference in memory bandwidth, losing 11% of that bandwidth in exchange for far lower DRAM power draw and a simpler PCB is hardly likely to be detrimental to performance.
Posted on Reply
#12
Wirko
TomorrowConsumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.
Here's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.
Posted on Reply
#13
Tomorrow
WirkoHere's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.
Especially if AIB's continue to make bigger and bigger coolers, instead of making smarter ones (like 5090 FE).
Posted on Reply
Add your own comment
Jan 17th, 2025 17:22 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts