Friday, May 24th 2024

NVIDIA Reportedly Having Issues with Samsung's HBM3 Chips Running Too Hot

According to Reuters, NVIDIA is having some major issues with Samsung's HBM3 chips, as NVIDIA hasn't managed to finalise its validations of the chips. Reuters are citing multiple sources that are familiar with the matter and it seems like Samsung is having some serious issues with its HMB3 chips if the sources are correct. Not only do the chips run hot, which itself is a big issue due to NVIDIA already having issues cooling some of its higher-end products, but the power consumption is apparently not where it should be either. Samsung is said to have tried to get its HBM3 and HBM3E parts validated by NVIDIA since sometime in 2023 according to Reuter's sources, which suggests that there have been issues for at least six months, if not longer.

The sources claim there are issues with both the 8- and 12-layer stacks of HMB3E parts from Samsung, suggesting that NVIDIA might only be able to supply parts from Micron and SK Hynix for now, the latter whom has been supplying HBM3 chips to NVIDIA since the middle of 2022 and HBM3E chips since March of this year. It's unclear if this is a production issue at Samsung's DRAM Fabs, a packaging related issue or something else entirely. The Reuter's piece goes on to speculating about Samsung not having had enough time to develop its HBM parts compared its competitors and that it's a rushed product, but Samsung issued a statement to the publication that it's a matter of customising the product for its customer's needs. Samsung also said that it's "the process of optimising its products through close collaboration with customers" without going into which customer(s). Samsung issued a further statement saying that "claims of failing due to heat and power consumption are not true" and that testing was going as expected.
Source: Reuters
Add your own comment

25 Comments on NVIDIA Reportedly Having Issues with Samsung's HBM3 Chips Running Too Hot

#2
hsew
bonehead123sobs, tears, condolences....

poor nGreeddiya, wheh wheh wheh, gonna have to spend some of that jacket money to fix a problem that Sammy created for them, wheh, wheh, wheh....

Shoulda just stuck with TSMC... oh but wait, Apple beat them to it, wheh, wheh, wheh

Not like Apple has much of a choice… imagine them being forced to use Samsung fabs!!
Posted on Reply
#3
MxPhenom 216
ASIC Engineer
bonehead123sobs, tears, condolences....

poor nGreeddiya, wheh wheh wheh, gonna have to spend some of that jacket money to fix a problem that Sammy created for them, wheh, wheh, wheh....

Shoulda just stuck with TSMC... oh but wait, Apple beat them to it, wheh, wheh, wheh

LOL this has nothing to do with TSMC vs Samsung. Nvidia does not design their own HBM and they are not the only one sourcing HBM from Samsung. All these players making AI ASIC are sourcing HBM from 2-3 vendors.
Posted on Reply
#4
R0H1T
hsewimagine them being forced to use Samsung fabs!!
Yeah imagine using the "fabs" of the biggest memory & NAND maker in the world :rolleyes:
Posted on Reply
#5
hsew
R0H1TYeah imagine using the "fabs" of the biggest memory & NAND maker in the world :rolleyes:
I almost forgot how much Apple and Samsung get along! :p It’s one thing to buy flash memory and displays from them but entirely another to hand them the blueprints and allow them to control the manufacturing of the SOC which they directly compete against… :kookoo:
Posted on Reply
#6
Gooigi's Ex
hsewNot like Apple has much of a choice… imagine them being forced to use Samsung fabs!!
At this point, that the only thing Apple ISN’T using from Samesung. They pretty much use screens, memory, ram, and I’m pretty sure I’m forgetting something else related to Samesung.
Posted on Reply
#7
thesmokingman
Whatever the extra cost they will just happily push it downstream. lol
Posted on Reply
#8
Fourstaff
Probably explains why Samsung Electronics shuffled their leadership. Being second fiddle to SK Hynix is a bitter pill.
Posted on Reply
#9
Denver
FourstaffProbably explains why Samsung Electronics shuffled their leadership. Being second fiddle to SK Hynix is a bitter pill.
SK Hynix has always led HBM production, with better yields and volume. This is not new...
Posted on Reply
#10
ir_cow
AMD had the same problem with HBM2 for the Vega64. Not to many working units these days.
Posted on Reply
#11
Fourstaff
DenverSK Hynix has always led HBM production, with better yields and volume. This is not new...
What is new is the sudden and dramatic divergence in fortunes between HBM powering the AI boom and your run off the mill DRAMs which are languishing.
Posted on Reply
#12
Minus Infinity
Ol' Jensen probably trying to squeeze a price cut out of Samsung. Hey man your chips run hot, we can still work with them, but it'll cost us a pretty penny to mitigate heat, so you better slash pricing.
Posted on Reply
#13
DemonicRyzen666
ah yes, the old "thermal density" problem is rearing its head again. Alongside probably a voltage/amperage curve problem.
Posted on Reply
#14
john_
Samsung and Intel should exchange information on how to manufacture power efficient chips........... :p
Posted on Reply
#15
londiste
bonehead123Shoulda just stuck with TSMC... oh but wait, Apple beat them to it, wheh, wheh, wheh
Since when has TSMC been producing HBM?
Posted on Reply
#16
Jism
I suspect, potential long term degradation. What Samsung reports as working, in DC enviroments with 24/7 usage and perhaps for the next years, this memory will show degradation and thus growing amount of damaged products and returns. You can only expect the highest grade stuff when you pay 10 to 30g per unit.

The Vega 56 with a flashed 64 bios was known for that. Long term HBM would degrade.
Posted on Reply
#17
AsRock
TPU addict
ir_cowAMD had the same problem with HBM2 for the Vega64. Not to many working units these days.
Yeah, all so i be very weary about buying any when they "fix" the issue too. Remember as long as it lasts the warranty.
Posted on Reply
#18
MxPhenom 216
ASIC Engineer
DemonicRyzen666ah yes, the old "thermal density" problem is rearing its head again. Alongside probably a voltage/amperage curve problem.
Thats not really the issue
Posted on Reply
#19
chrcoluk
Trying to run the chips at too much clock speed, just slow them down lol.
Posted on Reply
#20
Denver
MxPhenom 216Thats not really the issue
It is part of the problem, but SK Hynix has shaped to be extremely efficient in producing HBM over time, Samsung and Micron simply failed to follow. They may even seem to equate on Specs, but not in the efficiency/yields


"According to Choi, the company's head of packaging and testing, SK Hynix's proprietary MR-MUF is a key technology in HBM packaging. MR-MUF reduces chip stacking pressure by 6%, increases productivity by fourfold by reducing process time, and improves heat dissipation by 45% compared to earlier technologies.

SK Hynix recently released an advanced MR-MUF that improves heat dissipation by 10% through the use of a new protective material while keeping the existing advantages of MR-MUF, Choi said. Advanced MR-MUF is an optimum solution for high stacking, and technology development for 16-high stacking is underway. The company plans to utilize advanced MR-MUF achieving 16-high HBM4, while preemptively reviewing hybrid bonding technology."

www.digitimes.com/news/a20240502VL206/sk-hynix-2025-hbm-production.html

"SK hynix currently controls roughly 46% - 49% of HBM market, and its share is not expected to drop significantly in 2025, according to market tracking firm TrendForce. By contrast, Micron's share on HBM memory market is between 4% and 6%."

SK hynix Reports That 2025 HBM Memory Supply Has Nearly Sold Out (anandtech.com)
Posted on Reply
#21
thesmokingman
I imagine Jensen was watching SNL and he was like we need more COWBELL, er HBM.

Posted on Reply
#22
wolf
Better Than Native
Always fun to see the users who can't resist the opportunity for name calling and a good roast pop up, this seems pertinent to the topic though;
In separate statements after Reuters first published this report, Samsung said that "claims of failing due to heat and power consumption are not true," and that testing was "proceeding smoothly and as planned."
Media hungry as always to jump on any possible chance for coverage of a miss-step. After all, at the top of their game, all eyes are on them and there's no doubt a crowd hungry to feast on a meal of schadenfreude, as this thread demonstrates.
Posted on Reply
#23
Wirko
wolfAlways fun to see the users who can't resist the opportunity for name calling and a good roast pop up, this seems pertinent to the topic though;

Media hungry as always to jump on any possible chance for coverage of a miss-step. After all, at the top of their game, all eyes are on them and there's no doubt a crowd hungry to feast on a meal of schadenfreude, as this thread demonstrates.
Hey. We come here for tech power-ups, and to no surprise, we get tech power-ups here.
Posted on Reply
#24
Bwaze
wolfAlways fun to see the users who can't resist the opportunity for name calling and a good roast pop up, this seems pertinent to the topic though;

Media hungry as always to jump on any possible chance for coverage of a miss-step. After all, at the top of their game, all eyes are on them and there's no doubt a crowd hungry to feast on a meal of schadenfreude, as this thread demonstrates.
I don't know, I think companies have media pretty much in their grip when it comes to reporting miss-steps that truly matter to end users. Right now this is all very academic, and readers of these articles aren't really users of products that will use HMB.

But just remember when Samsung SSDs were failing. All of a sudden major tech sites weren't really reporting it, even when Samsung admitted it and released new firmwares that fixed the issues - it was more important to have a spotless reputation and tons of angry customers with failed SSDs than a wide public release that all users should check if their firmware is updated. :p
Posted on Reply
Add your own comment
Dec 19th, 2024 07:17 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts