• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

SK hynix Ships HBM4 Samples to NVIDIA in June, Mass Production Slated for Q3 2025

Nomad76

News Editor
Staff member
Joined
May 21, 2024
Messages
819 (3.38/day)
SK hynix has sped up its HBM4 development plans, according to a report from ZDNet. The company wants to start shipping HBM4 samples to NVIDIA this June, which is earlier than the original timeline. SK hynix hopes to start supplying products by the end of Q3 2025, this push likely aims to get a head start in the next-gen HBM market. To meet this sped-up schedule, SK hynix has set up a special HBM4 development team to supply NVIDIA. Industry sources indicated on January 15th that SK Hynix plans to deliver its first customer samples of HBM4 in early June this year. The company hit a big milestone when it wrapped up the HBM4 tapeout in Q4 2024, the last design step.

HBM4 marks the sixth iteration of high-bandwidth memory tech using stacked DRAM architecture. It comes after HBM3E, the current fifth-gen version, with large-scale production likely to kick off in late 2025 at the earliest. HBM4 boasts a big leap forward doubling data transfer ability with 2,048 I/O channels up from its forerunner. NVIDIA planned to use 12-layer stacked HBM4 in its 2026 "Rubin" line of powerful GPUs. However, NVIDIA has moved up its timeline for "Rubin" aiming to launch in late 2025.



A source familiar with the matter explained, "It seems that NVIDIA's will to launch Rubin early is stronger than expected, to the point that it is pushing forward trial production to the second half of this year." He added, "In line with this, memory companies such as SK hynix are also pushing for early supply of samples. Product supply could be possible as early as the end of the third quarter."

View at TechPowerUp Main Site | Source
 
Joined
Nov 26, 2021
Messages
1,730 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Is there a possibility that we will see HBM in Desktop GPUs one day?
It's very unlikely; HBM is far too expensive for anything short of a 5090.
 
Joined
May 10, 2023
Messages
515 (0.83/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Is there a possibility that we will see HBM in Desktop GPUs one day?
Apart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
 
Joined
Mar 18, 2023
Messages
959 (1.43/day)
System Name Never trust a socket with less than 2000 pins
I'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.
 
Joined
May 10, 2023
Messages
515 (0.83/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I'm more interested in having this on a CPU.

I would have applications that need every bit of core speed and don't need much memory.
There's that Xeon Max with HBM on board, or you could try to get your hands in one those MI300A from MS.
 
Joined
Nov 26, 2021
Messages
1,730 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Apart from costs, is there much of a point?
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).

To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
The cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales. A bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.

As for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power. Alternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.
 
Joined
Aug 21, 2013
Messages
1,980 (0.48/day)
Is there a possibility that we will see HBM in Desktop GPUs one day?
If by one day you mean one day again, then yes. Absolutely.
It's very unlikely; HBM is far too expensive for anything short of a 5090.
Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.
The 5090 with GDDR7 at 512-bit manages 1.8TB/s, which is higher than the A100 40GB PCIe (1.6TB/s) and pretty near the A100 80GB SXM/H100 80GB PCIe (2TB/s), all of which use HBM2e, and even the H100 SXM 64GB (2TB/s, HBM3).
That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.
To reach such high bandwidth you'd need enough stacks, which would both be hella expensive, and also give a consumer GPU way too much memory that's only usually meant for enterprise offerings.
Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.
The cost is prohibitive for most GPUs. However, given how much a 5090 costs, increasing prices by $500 to cover the HBM's cost shouldn't impact expected sales.
My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.
A bigger factor is capacity constraints; TSMC was capacity constrained for COWOS so it makes sense to use that limited capacity for higher margin datacenter GPUs rather than gaming GPUs. Given the explosion in interest in machine learning, the capacity constraint might be even worse now despite TSMC's investments in ameliorating it.
That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.
As for the advantages, HBM is far more power efficient than GDDR of the same generation. One stack of HBM4 would offer 89% of the bandwidth of the 5090's GDDR7 at a fraction of the power.
The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.
Alternatively, two stacks of HBM3e would exceed that bandwidth and increase capacity. HBM PHYs also require less area than GDDR PHYs so you could either have a smaller die or increase the number of SMXs to take advantage of the saved area and power.
And HBM3e is cheaper as it's not the latest and greatest.
 
Joined
Nov 26, 2021
Messages
1,730 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
If by one day you mean one day again, then yes. Absolutely.

Which version. AMD was able to release a consumer card with 16GB HBM2 six years ago for 700. Even if we assume doubling of capacity and upping to HBM3e that cards costing four figures (5080 and up) the cost is not the biggest issue. I suspect supply would be much more a problem.

That's only one side of the equation. There's also the power and the size on card. G7 may offer these things but it requires equally complex multiplayer PCB to support 512bit and G7 still requires 16 separate chips on the PCB.

Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.

My point exactly. With 5090 costing 2000+ the argument of "expensive" HBM seems more and more silly.

That's what im thinking too. Right now all HBM is sold to data center cards for much higher margins. Until the AI boom pops this wont change.
I believe this was also the reason why RDNA4 multi.chiplet high end versions were canned.

The lowest 4-Hi stack is 16GB using 4GB layers. So two 16GB stacks would offer 32GB with 3.2TB/s of speed.

And HBM3e is cheaper as it's not the latest and greatest.
Even HBM2 has advantages over GDDR7, but I was thinking HBM3. As for the capacity, there are 8 high stacks of HBM4 which would lead to 32 GB capacity at about 89% of the bandwidth of the existing 512-bit wide GDDR7 memory interface. Given that the gap in graphics performance and even compute performance between the 5090 and the 4090 is far less than the difference in memory bandwidth, losing 11% of that bandwidth in exchange for far lower DRAM power draw and a simpler PCB is hardly likely to be detrimental to performance.
 
Joined
Jan 3, 2021
Messages
3,708 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Consumer cards dont need more than two stacks of HBM4 to easily surpass G7 in capacity, speed, power efficiency and space savings.
Here's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.
 
Joined
Aug 21, 2013
Messages
1,980 (0.48/day)
Here's one of the problems (if you're a consumer, that is). Space savings are costly and don't benefit you if all you're buying is one or two GPUs. But if you're trying to compress 100 kilowatts' worth of processors in one rack, small memory footprint is crucial.
Especially if AIB's continue to make bigger and bigger coolers, instead of making smarter ones (like 5090 FE).
 
Top