Samsung Develops Industry-First 36GB HBM3E 12H DRAM

GFreeman · Feb 27, 2024

Samsung Electronics, a world leader in advanced memory technology, today announced that it has developed HBM3E 12H, the industry's first 12-stack HBM3E DRAM and the highest-capacity HBM product to date. Samsung's HBM3E 12H provides an all-time high bandwidth of up to 1,280 gigabytes per second (GB/s) and an industry-leading capacity of 36 gigabytes (GB). In comparison to the 8-stack HBM3 8H, both aspects have improved by more than 50%.

"The industry's AI service providers are increasingly requiring HBM with higher capacity, and our new HBM3E 12H product has been designed to answer that need," said Yongcheol Bae, Executive Vice President of Memory Product Planning at Samsung Electronics. "This new memory solution forms part of our drive toward developing core technologies for high-stack HBM and providing technological leadership for the high-capacity HBM market in the AI era."

The HBM3E 12H applies advanced thermal compression non-conductive film (TC NCF), allowing the 12-layer products to have the same height specification as 8-layer ones to meet current HBM package requirements. The technology is anticipated to have added benefits especially with higher stacks as the industry seeks to mitigate chip die warping that come with thinner die. Samsung has continued to lower the thickness of its NCF material and achieved the industry's smallest gap between chips at seven micrometers (µm), while also eliminating voids between layers. These efforts result in enhanced vertical density by over 20% compared to its HBM3 8H product.

Samsung's advanced TC NCF also improves thermal properties of the HBM by enabling the use of bumps in various sizes between the chips. During the chip bonding process, smaller bumps are used in areas for signaling and larger ones are placed in spots that require heat dissipation. This method also helps with higher product yield.

As AI applications grow exponentially, the HBM3E 12H is expected to be an optimal solution for future systems that require more memory. Its higher performance and capacity will especially allow customers to manage their resources more flexibly and reduce total cost of ownership (TCO) for datacenters. When used in AI applications, it is estimated that, in comparison to adopting HBM3 8H, the average speed for AI training can be increased by 34% while the number of simultaneous users of inference services can be expanded more than 11.5 times.

Samsung has begun sampling its HBM3E 12H to customers and mass production is slated for the first half of this year.

View at TechPowerUp Main Site | Source

Philaphlous · Feb 27, 2024

Needs to be in laptops.... can't say it enough...smaller/lighter/less power draw than traditional memory....why not?

Denver · Feb 27, 2024

Philaphlous said:
Needs to be in laptops.... can't say it enough...smaller/lighter/less power draw than traditional memory....why not?

Because you wouldn't want to pay $5k for it

evernessince · Feb 27, 2024

Denver said:
Because you wouldn't want to pay $5k for it

AMD had HBM in it's consumer class Vega 64 for $500 and even the cheaper Vega 56 so it's definitely possible to have reasonably priced HBM products. Right now the packaging required for HBM is completely booked by higher margin enterprise products. We could at some point see reasonably priced HBM products like Vega 64 again but the when is completley unknown.

Denver · Feb 27, 2024

evernessince said:
AMD had HBM in it's consumer class Vega 64 for $500 and even the cheaper Vega 56 so it's definitely possible to have reasonably priced HBM products. Right now the packaging required for HBM is completely booked by higher margin enterprise products. We could at some point see reasonably priced HBM products like Vega 64 again but the when is completley unknown.

Never. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.

AnotherReader · Feb 27, 2024

Denver said:
Never. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.

SemiAnalysis claims that Nvidia pays SK Hynix $1150 for the 6 stacks of HBM3 attached to one H100. GDDR6 is now available for $3 per GB, but let's increase that to $ 4 per GB for faster grades. Even then, HBM is nearly 3 times more expensive than GDDR6. I think Apple has shown the way by using LPDDR5X which is as efficient as HBM3. A case could be made for laptop specific GPUs that have wider DRAM buses than usual. In other words, instead of a 4080 with a 256-bit bus to GDDR6X, create a mobile 4090 with a 512-bit bus to LPDDR5X. The disadvantage would be higher costs due to the higher memory capacity.

LabRat 891 · Feb 27, 2024

Denver said:
Never. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.

I'd be happy with the last-gen scraps, TBQH.

If the AI/MI 'push' and demand for GPGPU/NPU power is as strong as it seems, why can't us plebian consumers get the left overs?

evernessince · Feb 27, 2024

Denver said:
Never. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.

Same as everything else in the computer chips world and that hasn't stopped better technology from trickling down the stack into affordable segments for everyday people.

Graphics GPUs don't need the latest HBM, that's complete overkill. There isn't a single consumer GPU with 32GB in total, let alone one that needs 32GB per HBM stack like the HBM3e discussed in the article.

LabRat 891 · Feb 27, 2024

evernessince said:
Same as everything else in the computer chips world and that hasn't stopped better technology from trickling down the stack into affordable segments for everyday people.

Graphics GPUs don't need the latest HBM, that's complete overkill. There isn't a single consumer GPU with 32GB in total, let alone one that needs 32GB per HBM stack like the HBM3e discussed in the article.

(100% non-consumer product, but) You reminded me that I still need a MI60 (Radeon VII but, not cut-down, and w/ double the HBM2)

Speaking Towards (old gen)HBM:
My friend and I were just recently comparing his old FuryX vs. my, RX580X and Vega64 in Avatar:FoP.

The FuryX performed very similarly (in avg) to the RX580X. However, the HBM-equipped FuryX had very stable minimum framerates.
OToH, the FuryX performed almost precisely 'half as well' as the Vega 64 did; both, having very steady minimum framerates.

(In my Jaded opinion)
I suspect us consumers don't get HBM anymore because, it makes products last too long.

In the money-making AI/MI (and 'Big Data') spaces, once a product has been superseded in performance, it's replaced.
More often than not, those retired devices are (semi-)proprietary and only re-useable by a smaller firm in a similar field.

'Individual consumers' and 'in-built obsolescence' both become a non-issue when, your primary market(s) have every profit-motive for buying The New Hotness (and, in bulk qty.).

evernessince · Feb 27, 2024

LabRat 891 said:
(100% non-consumer product, but) You reminded me that I still need a MI60 (Radeon VII but, not cut-down, and w/ double the HBM2)
View attachment 336636 View attachment 336637

Speaking Towards (old gen)HBM:
My friend and I were just recently comparing his old FuryX vs. my, RX580X and Vega64 in Avatar:FoP.

The FuryX performed very similarly (in avg) to the RX580X. However, the HBM-equipped FuryX had very stable minimum framerates.
OToH, the FuryX performed almost precisely 'half as well' as the Vega 64 did; both, having very steady minimum framerates.

(In my Jaded opinion)
I suspect us consumers don't get HBM anymore because, it makes products last too long.

In the money-making AI/MI (and 'Big Data') spaces, once a product has been superseded in performance, it's replaced.
More often than not, those retired devices are (semi-)proprietary and only re-useable by a smaller firm in a similar field.

'Individual consumers' and 'in-built obsolescence' both become a non-issue when, your primary market(s) have every profit-motive for buying The New Hotness (and, in bulk qty.).

32GB of HBM2 for around $600, not bad at all.

Wirko · Feb 27, 2024

LabRat 891 said:
I suspect us consumers don't get HBM anymore because, it makes products last too long.

No. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density. All systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.

AnotherReader said:
I think Apple has shown the way by using LPDDR5X which is as efficient as HBM3

Largest M chip has 2x 512-bit memory buses, right? That's actually very close to a single HBM stack - same width, similar transfer rate. Larger surface area but cheaper.

AnotherReader · Feb 27, 2024

Wirko said:
No. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density. All systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.

Largest M chip has 2x 512-bit memory buses, right? That's actually very close to a single HBM stack - same width, similar transfer rate. Larger surface area but cheaper.

The largest standalone M1 has a 512-bit bus to DRAM; the Ultra is composed of two separate M1 Max dies.

LabRat 891 · Feb 28, 2024

Wirko said:
No. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density.

My min. framerates and frametimes across Vega 10 XT, XTX, and XT GL along w/ my friend's Fiji XT, would beg to differ.
Newer faster GDDR6(X) cards may beat older HBM cards but, the (lower) avg framerate is steadier, and in some cases (moreso w/ Vega 20) flat-out sustains higher 0.1% and 1% lows.

Wirko said:
All systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.

Yes. For-profit 'Big Data' and 'AI/MI' eat up the supply of HBM, and have been the driving-force in manufacturing/developing the stuff.
Undeniable, and even sensible from my tiny little PoV.

Regardless, those stated benefits of HBM have wide (consumer) applications.
Mobile SoCs, GPUs; really, anything that benefits from compact, power efficient, ultra-low-latency DRAM.
Which, is pretty much every flavor of personal computing device imaginable. IMO, there's merely not yet been-allowed opportunity for older, well-developed process HBMs (HBMe HBM2e, HBM3, etc.) to find places in consumer-facing products.
IMO, (If press releases are to be believed :laugh:

) HBM's teething issues (overall), are mostly-solved. The issues inherit in each new generation (+extreme demand) are currently cost-driving.

Admittedly, The technology itself, complicates 'integration' just willy-nilly...

AnotherReader said:
The largest standalone M1 has a 512-bit bus to DRAM; the Ultra is composed of two separate M1 Max dies.

Still, in-theme w/ Wirko's point: rather than deal w/ 'HBM shenanigans', they designed a best effort replacement-of-function.

Which, in relation to (GP)GPUs, reminds me...
Personally, I'd even be happy w/ a return to phatass memory busses, like with ATI R600, nVidia G80, AMD Hawaii, nVidia GT200(B), etc.
As I recall, wide membus GPUs saw similar raw performance benefits as we'd later see exemplified-further in 1024bit - 4096bit on-die/package HBM. Still, nothing can replace HBM's extreme latency benefits of being wide and fast RAM, sitting next/atop the ASIC.
Problem: Some of the main purposes for both AMD's InfinityCache and Both's use of HBM, is to circumvent the issues and complications (typically/historically) seen with wide memory bus (GP)GPUs.
Though today, I could see an MCM-MCH design mitigating some those problems while allowing lots of cheaper (G)DRAM.

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	Sleepy Painter
Processor	AMD Ryzen 5 3600
Motherboard	Asus TuF Gaming X570-PLUS/WIFI
Cooling	FSP Windale 6 - Passive
Memory	2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s)	MSI RX580 8GB
Storage	2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s)	Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case	NZXT Gamma Classic Black
Audio Device(s)	Asus Xonar D1
Power Supply	Rosewill 1KW on 240V@60hz
Mouse	Logitech MX518 Legend
Keyboard	Red Dragon K552
Software	Windows 10 Enterprise 2019 LTSC 1809 17763.1757

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

System Name	Sleepy Painter
Processor	AMD Ryzen 5 3600
Motherboard	Asus TuF Gaming X570-PLUS/WIFI
Cooling	FSP Windale 6 - Passive
Memory	2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s)	MSI RX580 8GB
Storage	2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s)	Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case	NZXT Gamma Classic Black
Audio Device(s)	Asus Xonar D1
Power Supply	Rosewill 1KW on 240V@60hz
Mouse	Logitech MX518 Legend
Keyboard	Red Dragon K552
Software	Windows 10 Enterprise 2019 LTSC 1809 17763.1757

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Samsung Develops Industry-First 36GB HBM3E 12H DRAM

News Editor