Thursday, February 13th 2025

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs
During its first post-Western Digital spinoff investor day, SanDisk showed something it has been working on to tackle the AI sector. High-bandwidth flash (HBF) is a new memory architecture that combines 3D NAND flash storage with bandwidth capabilities comparable to high-bandwidth memory (HBM). The HBF design stacks 16 3D NAND BiCS8 dies using through-silicon vias, with a logic layer enabling parallel access to memory sub-arrays. This configuration achieves 8 to 16 times greater capacity per stack than current HBM implementations. A system using eight HBF stacks can provide 4 TB of VRAM to store large AI models like GPT-4 directly on GPU hardware. The architecture breaks from conventional NAND design by implementing independently accessible memory sub-arrays, moving beyond traditional multi-plane approaches. While HBF surpasses HBM's capacity specifications, it maintains higher latency than DRAM, limiting its application to specific workloads.
SanDisk has not disclosed its solution for NAND's inherent write endurance limitations, though using pSLC NAND makes it possible to balance durability and cost. The bandwidth of HBF is also unknown, as the company hasn't put out details yet. SanDisk Memory Technology Chief Alper Ilkbahar confirmed the technology targets read-intensive AI inference tasks rather than latency-sensitive applications. The company is developing HBF as an open standard, incorporating mechanical and electrical interfaces similar to HBM to simplify integration. Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints. While these factors make HBF unsuitable for gaming applications, the technology's high capacity and throughput characteristics align with AI model storage and inference requirements. SanDisk has announced plans for three generations of HBF development, indicating a long-term commitment to the technology.
Source:
via Tom's Hardware
SanDisk has not disclosed its solution for NAND's inherent write endurance limitations, though using pSLC NAND makes it possible to balance durability and cost. The bandwidth of HBF is also unknown, as the company hasn't put out details yet. SanDisk Memory Technology Chief Alper Ilkbahar confirmed the technology targets read-intensive AI inference tasks rather than latency-sensitive applications. The company is developing HBF as an open standard, incorporating mechanical and electrical interfaces similar to HBM to simplify integration. Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints. While these factors make HBF unsuitable for gaming applications, the technology's high capacity and throughput characteristics align with AI model storage and inference requirements. SanDisk has announced plans for three generations of HBF development, indicating a long-term commitment to the technology.
35 Comments on SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs
Even DRAM is far from being byte-addressable; the smallest unit of transfer is 64 bytes in DDR, 32 bytes in HBM3, and I think it's the same in HBM3E.
Aw so sad :D
n.O.t.....
If so good development to bring costs down for these.
The most famous one was Intel/Micron Optane/3D XPoint.
It doesn't seem that this one will do any better.
Inb4 tiered memory standards for Compute/Graphics?
Top: L1-3 caches
Upper: HBM
Lower: HBF
Bottom: NAND
Stack it 'till it's cheap :laugh:
If they are using flash here it may be non-volatile which could make it quite flexible.
maybe, parts of a Model can better utilize different kinds of storage?
I'm thinking:
"Working memory" Cache, HBM, RAM.
"Short-Term Memory" HBF, XLflash, phase-change memory, massively parallelized (p)SLC NAND.
"Long-Term Memory" TLC and QLC NAND.
"Archival Memory" HDDs and Magnetic Tape.
Also, that new technology could affect progress of CXL Memory Expanders ( very expensive stuff right now ). 4TB inside of a GPU is a lot of memory for processing!
www.forbes.com/sites/tomcoughlin/2023/09/06/emerging-non-volatile-memories-enable-iot-and-ai-growth/
It allows you to take fetches that would otherwise be to main system memory or mass storage and put them right on the chip. This lowers latency and power consumption. In addition, flash doesn't needed to be constantly refreshed when not actively in use so you can very aggressively power tune it. This is simply not possible with volatile memory that needs to be refreshed to maintain data.
I believe LabRat 891 put it perfectly, it makes sense as another layer in the memory subsystem designed to hold a specific set of data and the overall workload will see a very nice benefit as a result.
1bit per second, arbitrary value.
NOT an HBM killer. That was a very click-bait headline. Come on peeps, TPU is better than that crap..
That was my point. For that reason alone, HBF is NOT an HBM killer. Until we have a major break-through in NAND flash durability it will not change. All Sandisk has done is create mildly and temporarily useful E-waste.
documents.sandisk.com/content/dam/asset-library/en_us/assets/public/sandisk/corporate/Sandisk-Investor-Day_2025.pdf