Thursday, February 13th 2025

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

During its first post-Western Digital spinoff investor day, SanDisk showed something it has been working on to tackle the AI sector. High-bandwidth flash (HBF) is a new memory architecture that combines 3D NAND flash storage with bandwidth capabilities comparable to high-bandwidth memory (HBM). The HBF design stacks 16 3D NAND BiCS8 dies using through-silicon vias, with a logic layer enabling parallel access to memory sub-arrays. This configuration achieves 8 to 16 times greater capacity per stack than current HBM implementations. A system using eight HBF stacks can provide 4 TB of VRAM to store large AI models like GPT-4 directly on GPU hardware. The architecture breaks from conventional NAND design by implementing independently accessible memory sub-arrays, moving beyond traditional multi-plane approaches. While HBF surpasses HBM's capacity specifications, it maintains higher latency than DRAM, limiting its application to specific workloads.

SanDisk has not disclosed its solution for NAND's inherent write endurance limitations, though using pSLC NAND makes it possible to balance durability and cost. The bandwidth of HBF is also unknown, as the company hasn't put out details yet. SanDisk Memory Technology Chief Alper Ilkbahar confirmed the technology targets read-intensive AI inference tasks rather than latency-sensitive applications. The company is developing HBF as an open standard, incorporating mechanical and electrical interfaces similar to HBM to simplify integration. Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints. While these factors make HBF unsuitable for gaming applications, the technology's high capacity and throughput characteristics align with AI model storage and inference requirements. SanDisk has announced plans for three generations of HBF development, indicating a long-term commitment to the technology.
Source: via Tom's Hardware
Add your own comment

35 Comments on SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

#26
duckface
I don't understand why vram is so limited these days for the price we pay for video cards, they could launch cards with 512gb 1tb even if it's not the total used all the time as it would need a lot of speed for reading, they could make cards to store vram for AI it's very important. AMD should focus on cards with large vram for personal use for AI
Posted on Reply
#27
lexluthermiester
Solid State BrainWhat makes you think so? LLM weights (at least as of now) are static and once loaded in memory they won't need to be modified unless you need to replace them entirely with something else.
Yes, but they have to be updated everytime they are altered and that means block erase/write cycles. This happens more frequently than you think.

HBF can not replace HBM. Augment it, maybe. Replace? Absolutely not.
Posted on Reply
#28
Solid State Brain
Deployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
Posted on Reply
#29
lexluthermiester
Solid State BrainDeployed LLMs don't get updated as frequently as you think. Even if that occurred daily, that would be 3650 program/erase cycles over 10 years of service, which should be easy to attain for Flash memory that doesn't need to have an end-life data retention longer than hours or even minutes.
That would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.
Posted on Reply
#30
Wirko
Solid State Braindata retention doesn't need to be very long, and this will increase the number of write/erase cycles allowed.
One more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
Posted on Reply
#31
lexluthermiester
WirkoOne more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
That's an interesting idea. I don't think that is what Sandisk is marketing at though.
Posted on Reply
#32
Solid State Brain
lexluthermiesterThat would only be true IF the end user stays on the same LLM all the time. Most do not. It depends on the required task. For this tech to be of ANY benefit, the LLM would need to be dynamically switchable on the fly. That means lots of erase/write cycles.
I don't know where you got the idea that cloud AI model providers switch LLMs on the fly that frequently. It isn't happening at a small scale (tens~hundreds of simultaneous users)—the same models get served continuously for at least days or weeks—and at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.
WirkoOne more advantage of NAND is that it can store analog information with at least 4-bit integer precision, probably more, if long term retention isn't a concern. A step closer to a "solid state brain", so to say.
I imagine this would more easily imply hardware-level support for quantized AI model weights. Every low-precision model parameter (e.g. in 4- or 5- bit) could be directly mapped to raw NAND cells for potentially improved performance.
Posted on Reply
#33
lexluthermiester
Solid State Brainand at a large scale (up to hundreds of thousands of users) they will have entire GPU clusters dedicated to specific models in order to increase availability as much as possible.
That's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
Posted on Reply
#34
Wirko
lexluthermiesterThat's a good point. Had not thought about it in that wide of a scale. What was striking is the idea of replacing DRAM with NAND. It seems like a foolish idea and I'm highly dubious of it.
Also, it can be mixed HBM+HBF. One of the slides at Tom's shows such a case.
Posted on Reply
#35
LMTMFA
AssimilatorYeah, no. NAND flash is not RAM, it is designed for entirely different usage patterns, and the notion that it could be used as a replacement for RAM is nonsensical. Considering GPUs already have effectively direct access to storage via APIs like DirectStorage, I see no use-case for this technology.
Welp, glad you weighed in, all those researchers and engineers can go back to doing stuff that's actually good for something now. /s

My God, the Ego to make a statement like "I don't see any use-case", like these people overlooked something that you just armchaired into.

This connects like HBM, the access is much more direct, so it's far faster. LLMs don't need a lot of write, mostly read (relatively), and it can be paired with HBM or VRAM for the things that do need writes. Loading up a huge LLM like this would be huge.

It's niche, but it's one hell of a niche.
Posted on Reply
Add your own comment
Feb 20th, 2025 15:42 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts