Tuesday, February 27th 2024

Samsung Develops Industry-First 36GB HBM3E 12H DRAM

Samsung Electronics, a world leader in advanced memory technology, today announced that it has developed HBM3E 12H, the industry's first 12-stack HBM3E DRAM and the highest-capacity HBM product to date. Samsung's HBM3E 12H provides an all-time high bandwidth of up to 1,280 gigabytes per second (GB/s) and an industry-leading capacity of 36 gigabytes (GB). In comparison to the 8-stack HBM3 8H, both aspects have improved by more than 50%.

"The industry's AI service providers are increasingly requiring HBM with higher capacity, and our new HBM3E 12H product has been designed to answer that need," said Yongcheol Bae, Executive Vice President of Memory Product Planning at Samsung Electronics. "This new memory solution forms part of our drive toward developing core technologies for high-stack HBM and providing technological leadership for the high-capacity HBM market in the AI era."
The HBM3E 12H applies advanced thermal compression non-conductive film (TC NCF), allowing the 12-layer products to have the same height specification as 8-layer ones to meet current HBM package requirements. The technology is anticipated to have added benefits especially with higher stacks as the industry seeks to mitigate chip die warping that come with thinner die. Samsung has continued to lower the thickness of its NCF material and achieved the industry's smallest gap between chips at seven micrometers (µm), while also eliminating voids between layers. These efforts result in enhanced vertical density by over 20% compared to its HBM3 8H product.

Samsung's advanced TC NCF also improves thermal properties of the HBM by enabling the use of bumps in various sizes between the chips. During the chip bonding process, smaller bumps are used in areas for signaling and larger ones are placed in spots that require heat dissipation. This method also helps with higher product yield.

As AI applications grow exponentially, the HBM3E 12H is expected to be an optimal solution for future systems that require more memory. Its higher performance and capacity will especially allow customers to manage their resources more flexibly and reduce total cost of ownership (TCO) for datacenters. When used in AI applications, it is estimated that, in comparison to adopting HBM3 8H, the average speed for AI training can be increased by 34% while the number of simultaneous users of inference services can be expanded more than 11.5 times.

Samsung has begun sampling its HBM3E 12H to customers and mass production is slated for the first half of this year.
Source: Samsung
Add your own comment

12 Comments on Samsung Develops Industry-First 36GB HBM3E 12H DRAM

#1
Philaphlous
Needs to be in laptops.... can't say it enough...smaller/lighter/less power draw than traditional memory....why not?
Posted on Reply
#2
Denver
PhilaphlousNeeds to be in laptops.... can't say it enough...smaller/lighter/less power draw than traditional memory....why not?
Because you wouldn't want to pay $5k for it
Posted on Reply
#3
evernessince
DenverBecause you wouldn't want to pay $5k for it
AMD had HBM in it's consumer class Vega 64 for $500 and even the cheaper Vega 56 so it's definitely possible to have reasonably priced HBM products. Right now the packaging required for HBM is completely booked by higher margin enterprise products. We could at some point see reasonably priced HBM products like Vega 64 again but the when is completley unknown.
Posted on Reply
#4
Denver
evernessinceAMD had HBM in it's consumer class Vega 64 for $500 and even the cheaper Vega 56 so it's definitely possible to have reasonably priced HBM products. Right now the packaging required for HBM is completely booked by higher margin enterprise products. We could at some point see reasonably priced HBM products like Vega 64 again but the when is completley unknown.
Never. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.
Posted on Reply
#5
AnotherReader
DenverNever. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.
SemiAnalysis claims that Nvidia pays SK Hynix $1150 for the 6 stacks of HBM3 attached to one H100. GDDR6 is now available for $3 per GB, but let's increase that to $ 4 per GB for faster grades. Even then, HBM is nearly 3 times more expensive than GDDR6. I think Apple has shown the way by using LPDDR5X which is as efficient as HBM3. A case could be made for laptop specific GPUs that have wider DRAM buses than usual. In other words, instead of a 4080 with a 256-bit bus to GDDR6X, create a mobile 4090 with a 512-bit bus to LPDDR5X. The disadvantage would be higher costs due to the higher memory capacity.
Posted on Reply
#6
LabRat 891
DenverNever. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.
I'd be happy with the last-gen scraps, TBQH.

If the AI/MI 'push' and demand for GPGPU/NPU power is as strong as it seems, why can't us plebian consumers get the left overs?
Posted on Reply
#7
evernessince
DenverNever. Each HBM iteration is more complex and expensive.

AMD launched some products with minimal profit margins during the first and second generations of HBM. Normally, the current generation (HBM3) would be about five times more expensive. However, due to AI-created scarcities, prices should have doubled or tripled.
Same as everything else in the computer chips world and that hasn't stopped better technology from trickling down the stack into affordable segments for everyday people.

Graphics GPUs don't need the latest HBM, that's complete overkill. There isn't a single consumer GPU with 32GB in total, let alone one that needs 32GB per HBM stack like the HBM3e discussed in the article.
Posted on Reply
#8
LabRat 891
evernessinceSame as everything else in the computer chips world and that hasn't stopped better technology from trickling down the stack into affordable segments for everyday people.

Graphics GPUs don't need the latest HBM, that's complete overkill. There isn't a single consumer GPU with 32GB in total, let alone one that needs 32GB per HBM stack like the HBM3e discussed in the article.
(100% non-consumer product, but) You reminded me that I still need a MI60 (Radeon VII but, not cut-down, and w/ double the HBM2)


Speaking Towards (old gen)HBM:
My friend and I were just recently comparing his old FuryX vs. my, RX580X and Vega64 in Avatar:FoP.

The FuryX performed very similarly (in avg) to the RX580X. However, the HBM-equipped FuryX had very stable minimum framerates.
OToH, the FuryX performed almost precisely 'half as well' as the Vega 64 did; both, having very steady minimum framerates.


(In my Jaded opinion)
I suspect us consumers don't get HBM anymore because, it makes products last too long.

In the money-making AI/MI (and 'Big Data') spaces, once a product has been superseded in performance, it's replaced.
More often than not, those retired devices are (semi-)proprietary and only re-useable by a smaller firm in a similar field.

'Individual consumers' and 'in-built obsolescence' both become a non-issue when, your primary market(s) have every profit-motive for buying The New Hotness (and, in bulk qty.).
Posted on Reply
#9
evernessince
LabRat 891(100% non-consumer product, but) You reminded me that I still need a MI60 (Radeon VII but, not cut-down, and w/ double the HBM2)


Speaking Towards (old gen)HBM:
My friend and I were just recently comparing his old FuryX vs. my, RX580X and Vega64 in Avatar:FoP.

The FuryX performed very similarly (in avg) to the RX580X. However, the HBM-equipped FuryX had very stable minimum framerates.
OToH, the FuryX performed almost precisely 'half as well' as the Vega 64 did; both, having very steady minimum framerates.


(In my Jaded opinion)
I suspect us consumers don't get HBM anymore because, it makes products last too long.

In the money-making AI/MI (and 'Big Data') spaces, once a product has been superseded in performance, it's replaced.
More often than not, those retired devices are (semi-)proprietary and only re-useable by a smaller firm in a similar field.

'Individual consumers' and 'in-built obsolescence' both become a non-issue when, your primary market(s) have every profit-motive for buying The New Hotness (and, in bulk qty.).
32GB of HBM2 for around $600, not bad at all.
Posted on Reply
#10
Wirko
LabRat 891I suspect us consumers don't get HBM anymore because, it makes products last too long.
No. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density. All systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.
AnotherReaderI think Apple has shown the way by using LPDDR5X which is as efficient as HBM3
Largest M chip has 2x 512-bit memory buses, right? That's actually very close to a single HBM stack - same width, similar transfer rate. Larger surface area but cheaper.
Posted on Reply
#11
AnotherReader
WirkoNo. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density. All systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.


Largest M chip has 2x 512-bit memory buses, right? That's actually very close to a single HBM stack - same width, similar transfer rate. Larger surface area but cheaper.
The largest standalone M1 has a 512-bit bus to DRAM; the Ultra is composed of two separate M1 Max dies.
Posted on Reply
#12
LabRat 891
WirkoNo. We consumers are just unable to really make use of two great advantages of the very expensive HBM: bus width and density.
My min. framerates and frametimes across Vega 10 XT, XTX, and XT GL along w/ my friend's Fiji XT, would beg to differ.
Newer faster GDDR6(X) cards may beat older HBM cards but, the (lower) avg framerate is steadier, and in some cases (moreso w/ Vega 20) flat-out sustains higher 0.1% and 1% lows.
WirkoAll systems with HBM that I know of have multiple stacks. Not only do they achieve large bandwidth and memory capacity, they do it in a small footprint - which makes higher transfer rates possible while saving power. High density and power saving matter a lot if you have thousands of processors, less so if you have one.
Yes. For-profit 'Big Data' and 'AI/MI' eat up the supply of HBM, and have been the driving-force in manufacturing/developing the stuff.
Undeniable, and even sensible from my tiny little PoV.

Regardless, those stated benefits of HBM have wide (consumer) applications.
Mobile SoCs, GPUs; really, anything that benefits from compact, power efficient, ultra-low-latency DRAM.
Which, is pretty much every flavor of personal computing device imaginable. IMO, there's merely not yet been-allowed opportunity for older, well-developed process HBMs (HBMe HBM2e, HBM3, etc.) to find places in consumer-facing products.
IMO, (If press releases are to be believed :laugh:) HBM's teething issues (overall), are mostly-solved. The issues inherit in each new generation (+extreme demand) are currently cost-driving.

Admittedly, The technology itself, complicates 'integration' just willy-nilly...
AnotherReaderThe largest standalone M1 has a 512-bit bus to DRAM; the Ultra is composed of two separate M1 Max dies.
Still, in-theme w/ Wirko's point: rather than deal w/ 'HBM shenanigans', they designed a best effort replacement-of-function.

Which, in relation to (GP)GPUs, reminds me...
Personally, I'd even be happy w/ a return to phatass memory busses, like with ATI R600, nVidia G80, AMD Hawaii, nVidia GT200(B), etc.
As I recall, wide membus GPUs saw similar raw performance benefits as we'd later see exemplified-further in 1024bit - 4096bit on-die/package HBM. Still, nothing can replace HBM's extreme latency benefits of being wide and fast RAM, sitting next/atop the ASIC.
Problem: Some of the main purposes for both AMD's InfinityCache and Both's use of HBM, is to circumvent the issues and complications (typically/historically) seen with wide memory bus (GP)GPUs.
Though today, I could see an MCM-MCH design mitigating some those problems while allowing lots of cheaper (G)DRAM.
Posted on Reply
Add your own comment
May 2nd, 2024 02:37 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts