SK hynix Ships HBM4 Samples to NVIDIA in June, Mass Production Slated for Q3 2025

Nomad76 · Jan 16, 2025

SK hynix has sped up its HBM4 development plans, according to a report from ZDNet. The company wants to start shipping HBM4 samples to NVIDIA this June, which is earlier than the original timeline. SK hynix hopes to start supplying products by the end of Q3 2025, this push likely aims to get a head start in the next-gen HBM market. To meet this sped-up schedule, SK hynix has set up a special HBM4 development team to supply NVIDIA. Industry sources indicated on January 15th that SK Hynix plans to deliver its first customer samples of HBM4 in early June this year. The company hit a big milestone when it wrapped up the HBM4 tapeout in Q4 2024, the last design step.

HBM4 marks the sixth iteration of high-bandwidth memory tech using stacked DRAM architecture. It comes after HBM3E, the current fifth-gen version, with large-scale production likely to kick off in late 2025 at the earliest. HBM4 boasts a big leap forward doubling data transfer ability with 2,048 I/O channels up from its forerunner. NVIDIA planned to use 12-layer stacked HBM4 in its 2026 "Rubin" line of powerful GPUs. However, NVIDIA has moved up its timeline for "Rubin" aiming to launch in late 2025.

A source familiar with the matter explained, "It seems that NVIDIA's will to launch Rubin early is stronger than expected, to the point that it is pushing forward trial production to the second half of this year." He added, "In line with this, memory companies such as SK hynix are also pushing for early supply of samples. Product supply could be possible as early as the end of the third quarter."

View at TechPowerUp Main Site | Source

Thunder · Jan 16, 2025

Is there a possibility that we will see HBM in Desktop GPUs one day?

Chaitanya · Jan 16, 2025

For the fully unlocked Blackwell dies then.

AnotherReader · Jan 16, 2025

Thunder said:
Is there a possibility that we will see HBM in Desktop GPUs one day?

It's very unlikely; HBM is far too expensive for anything short of a 5090.

Nomad76 · Jan 16, 2025

Thunder said:
Is there a possibility that we will see HBM in Desktop GPUs one day?

One day, yes. However, that day is far away in the future. Technically not a problem, just a matter of costs as @AnotherReader pointed out.

igormp · Jan 16, 2025

Thunder said:
Is there a possibility that we will see HBM in Desktop GPUs one day?

Apart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.

unwind-protect · Jan 16, 2025

I'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.

igormp · Jan 16, 2025

unwind-protect said:
I'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.

There's that Xeon Max with HBM on board, or you could try to get your hands in one those MI300A from MS.

AnotherReader · Jan 16, 2025

igormp said:
Apart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.

The cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales. A bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.

As for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power. Alternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.

Philaphlous · Jan 16, 2025

It'll be the day when laptops receive HBM.... so much space savings..

Tomorrow · Jan 16, 2025

Thunder said:
Is there a possibility that we will see HBM in Desktop GPUs one day?

If by one day you mean one day again, then yes. Absolutely.

AnotherReader said:
It's very unlikely; HBM is far too expensive for anything short of a 5090.

Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.

igormp said:
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.

igormp said:
To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.

Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.

AnotherReader said:
The cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales.

My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.

AnotherReader said:
A bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.

That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.

AnotherReader said:
As for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power.

The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.

AnotherReader said:
Alternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.

And HBM3e is cheaper as it's not the latest and greatest.

AnotherReader · Jan 16, 2025

Tomorrow said:
If by one day you mean one day again, then yes. Absolutely.

Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.

That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.

Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.

My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.

That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.

The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.

And HBM3e is cheaper as it's not the latest and greatest.

Even HBM2 has advantages over GDDR7, but I was thinking HBM3. As for the capacity, there are 8 high stacks of HBM4 which would lead to 32 GB capacity at about 89% of the bandwidth of the existing 512-bit wide GDDR7 memory interface. Given that the gap in graphics performance and even compute performance between the 5090 and the 4090 is far less than the difference in memory bandwidth, losing 11% of that bandwidth in exchange for far lower DRAM power draw and a simpler PCB is hardly likely to be detrimental to performance.

Wirko · Jan 16, 2025

Tomorrow said:
Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.

Here's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.

Tomorrow · Jan 17, 2025

Wirko said:
Here's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.

Especially if AIB's continue to make bigger and bigger coolers, instead of making smarter ones (like 5090 FE).

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	9950x \| 5950x
Motherboard	x670e ProArt\| B550 ProArt
Cooling	PA 120 SE \|Fuma 2
Memory	4x64GB Kingston CUDIMM @5200MHz \| 4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	Corsair RM1000e \| XPG Core Reactor 850W
Software	I use Arch btw

Processor	9950x \| 5950x
Motherboard	x670e ProArt\| B550 ProArt
Cooling	PA 120 SE \|Fuma 2
Memory	4x64GB Kingston CUDIMM @5200MHz \| 4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	Corsair RM1000e \| XPG Core Reactor 850W
Software	I use Arch btw

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	DarkStar
Processor	AMD Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Master 1.0 (BIOS F39g)
Cooling	Arctic Liquid Freezer II 420mm AIO (rev4)
Memory	4x8GB Patriot Viper DDR4 4400C19 @ 3733Mhz 14-14-13-27 1T
Video Card(s)	Gigabyte Radeon RX 9070 XT Gaming OC 16GB GDDR6 @ 3400Mhz Core/22Gbps Mem
Storage	1TB Samsung 990 Pro (OS);2TB Samsung PM9A1;4TB XPG S70 Blade (Games);14TB WD UltraStar HC530 (Video)
Display(s)	27" LG UltraGear 27GS85Q-B @ 2560x1440 @ 200Hz, Nano-IPS
Case	be quiet! Dark Base Pro 900 Rev.2
Audio Device(s)	SteelSeries Arctis Nova Pro Wireless
Power Supply	1000W Seasonic PRIME Ultra Titanium;600W APC SMT750i UPS
Mouse	Logitech G604
Keyboard	Logitech G910 Orion Spark
Software	Windows 11 Pro x64 24H2 (Build 26100.4351)

SK hynix Ships HBM4 Samples to NVIDIA in June, Mass Production Slated for Q3 2025

Nomad76

News Editor

Thunder

New Member

Chaitanya

AnotherReader

Nomad76

News Editor

igormp

unwind-protect

igormp

AnotherReader

Philaphlous

Tomorrow

AnotherReader

Wirko

Tomorrow

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin