• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

Joined
Jun 22, 2012
Messages
322 (0.07/day)
Processor Intel i7-12700K
Motherboard MSI PRO Z690-A WIFI
Cooling Noctua NH-D15S
Memory Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s) MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case Fractal Define C
Power Supply Corsair RM850x
Mouse Logitech G203
Software openSUSE Tumbleweed
While that is a reasonable point, NAND flash simply doesn't have the durability to be useful long term in such a way.

What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else. Since datacenter GPUs will basically never be turned off and the HBF isn't going to store irreplaceable data anyway (the weights will likely be first read from slower long-term storage devices), data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.

Another reasonable point, however, that was not the claim made in the above article.

The linked original presentation from SanDisk is showing one such configurations on page 99:
 

duckface

New Member
Joined
May 16, 2024
Messages
17 (0.06/day)
I don't understand why vram is so limited these days for the price we pay for video cards, they could launch cards with 512gb 1tb even if it's not the total used all the time as it would need a lot of speed for reading, they could make cards to store vram for AI it's very important. AMD should focus on cards with large vram for personal use for AI
 
Joined
Jul 5, 2013
Messages
29,265 (6.89/day)
What makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else.
Yes, but they have to be updated everytime they are altered and that means block erase/write cycles. This happens more frequently than you think.

HBF can not replace HBM. Augment it, maybe. Replace? Absolutely not.
 
Joined
Jun 22, 2012
Messages
322 (0.07/day)
Processor Intel i7-12700K
Motherboard MSI PRO Z690-A WIFI
Cooling Noctua NH-D15S
Memory Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s) MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case Fractal Define C
Power Supply Corsair RM850x
Mouse Logitech G203
Software openSUSE Tumbleweed
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
 
Joined
Jul 5, 2013
Messages
29,265 (6.89/day)
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.
 
Joined
Jan 3, 2021
Messages
3,858 (2.55/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
data retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
 
Joined
Jul 5, 2013
Messages
29,265 (6.89/day)
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
That's an interesting idea. I don't think that is what Sandisk is marketing at though.
 
Joined
Jun 22, 2012
Messages
322 (0.07/day)
Processor Intel i7-12700K
Motherboard MSI PRO Z690-A WIFI
Cooling Noctua NH-D15S
Memory Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s) MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case Fractal Define C
Power Supply Corsair RM850x
Mouse Logitech G203
Software openSUSE Tumbleweed
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.

I don't know where you got the idea that cloud AI model providers switch LLMs on the fly that frequently. It isn't happening at a small scale (tens~hundreds of simultaneous users)—the same models get served continuously for at least days or weeks—and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.

One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.

I imagine this would more easily imply hardware-level support for quantized AI model weights. Every low-precision model parameter (e.g. in 4- or 5- bit) could be directly mapped to raw NAND cells for potentially improved performance.
 
Joined
Jul 5, 2013
Messages
29,265 (6.89/day)
and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
 
Joined
Jan 3, 2021
Messages
3,858 (2.55/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
Also, it can be mixed HBM+HBF. One of the slides at Tom's shows such a case.
 

LMTMFA

New Member
Joined
Feb 14, 2025
Messages
1 (0.14/day)
Yeah, no. NAND flash is not RAM, it is designed for entirely different usage patterns, and the notion that it could be used as a replacement for RAM is nonsensical. Considering GPUs already have effectively direct access to storage via APIs like DirectStorage, I see no use-case for this technology.

Welp, glad you weighed in, all those researchers and engineers can go back to doing stuff that's actually good for something now. /s

My God, the Ego to make a statement like "I don't see any use-case", like these people overlooked something that you just armchaired into.

This connects like HBM, the access is much more direct, so it's far faster. LLMs don't need a lot of write, mostly read (relatively), and it can be paired with HBM or VRAM for the things that do need writes. Loading up a huge LLM like this would be huge.

It's niche, but it's one hell of a niche.
 
Top