SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

AleksandarK · Feb 13, 2025

During its first post-Western Digital spinoff investor day, SanDisk showed something it has been working on to tackle the AI sector. High-bandwidth flash (HBF) is a new memory architecture that combines 3D NAND flash storage with bandwidth capabilities comparable to high-bandwidth memory (HBM). The HBF design stacks 16 3D NAND BiCS8 dies using through-silicon vias, with a logic layer enabling parallel access to memory sub-arrays. This configuration achieves 8 to 16 times greater capacity per stack than current HBM implementations. A system using eight HBF stacks can provide 4 TB of VRAM to store large AI models like GPT-4 directly on GPU hardware. The architecture breaks from conventional NAND design by implementing independently accessible memory sub-arrays, moving beyond traditional multi-plane approaches. While HBF surpasses HBM's capacity specifications, it maintains higher latency than DRAM, limiting its application to specific workloads.

SanDisk has not disclosed its solution for NAND's inherent write endurance limitations, though using pSLC NAND makes it possible to balance durability and cost. The bandwidth of HBF is also unknown, as the company hasn't put out details yet. SanDisk Memory Technology Chief Alper Ilkbahar confirmed the technology targets read-intensive AI inference tasks rather than latency-sensitive applications. The company is developing HBF as an open standard, incorporating mechanical and electrical interfaces similar to HBM to simplify integration. Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints. While these factors make HBF unsuitable for gaming applications, the technology's high capacity and throughput characteristics align with AI model storage and inference requirements. SanDisk has announced plans for three generations of HBF development, indicating a long-term commitment to the technology.

View at TechPowerUp Main Site | Source

Denver · Feb 13, 2025

Correct me if I'm wrong... but AI models would rapidly degrade NAND storage, making it impractical for long-term use, since NAND ( when used as VRAM) requires continuous high-frequency read and write operations.

Wirko · Feb 13, 2025

AleksandarK said:
Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints.

It's actually page-level reading/writing and block-level erasing, where a page is 4 ki-bi-bytes and a block is a few megabytes. However, the architecture of HBF seems to be a lot different, and the page size may also be smaller (or larger) if Sandisk thinks it's better for the purpose.
Even DRAM is far from being byte-addressable; the smallest unit of transfer is 64 bytes in DDR, 32 bytes in HBM3, and I think it's the same in HBM3E.

AnotherReader · Feb 13, 2025

Wirko said:
It's actually page-level reading/writing and block-level erasing, where a page is 4 ki-bi-bytes and a block is a few megabytes. However, the architecture of HBF seems to be a lot different, and the page size may also be smaller (or larger) if Sandisk thinks it's better for the purpose.
Even DRAM is far from being byte-addressable; the smallest unit of transfer is 64 bytes in DDR, 32 bytes in HBM3, and I think it's the same in HBM3E.

It isn't just the granularity of transfer. DRAM has unlimited endurance; this, on the other hand, is unlikely to be much better than SLC.

bonehead123 · Feb 13, 2025

Wirko said:
It's actually page-level reading/writing and block-level erasing, where a page is 4 ki-bi-bytes and a block is a few megabytes. However, the architecture of HBF seems to be a lot different, and the page size may also be smaller (or larger) if Sandisk thinks it's better for the purpose.
Even DRAM is far from being byte-addressable; the smallest unit of transfer is 64 bytes in DDR, 32 bytes in HBM3, and I think it's the same in HBM3E.

AnotherReader said:
It isn't just the granularity of transfer. DRAM has unlimited endurance; this, on the other hand, is unlikely to be much better than SLC.

Soooo...what ya'll are saying is that there won't be any 4TB, $25K GPU's for da gamrz to drool over, at least not for a while anyways ?

Aw so sad

n.O.t.....

qlum · Feb 13, 2025

I assume it is mostly meant for Large AI models, which require quite a lot of vram to run. Performance as memory will not be great but if it's performant enough, with the dram on top, it may very well be good enough.
If so good development to bring costs down for these.

andrehide · Feb 13, 2025

Previous attempts to use non-RAM as RAM failed.
The most famous one was Intel/Micron Optane/3D XPoint.
It doesn't seem that this one will do any better.

Wirko · Feb 13, 2025

bonehead123 said:
Soooo...what ya'll are saying is that there won't be any 4TB, $25K GPU's for us gamrz to drool over, at least not for a while anyways ?

Aw so sad

n.O.t.....

The trend is obviously towards 8GB, $25K GPUs, but the frog tastes best if cooked slowly.

Denver said:
Correct me if I'm wrong... but AI models would rapidly degrade NAND storage, making it impractical for long-term use, since NAND ( when used as VRAM) requires continuous high-frequency read and write operations.

The models would only be updated occasionally, that's the idea, so writing wouldn't be much of a problem. But limited read endurance is also sometimes hinted at. I don't know how much research has been done around read degradation, and whether it's relevant. Anyway, processing needs RAM too, a couple hundred MB of static RAM cache can't suffice for that, so inevitably some HBM will be part of the system too.

LabRat 891 · Feb 13, 2025

This HBF looks 'useful', but not on its own.
Inb4 tiered memory standards for Compute/Graphics?

Top: L1-3 caches
Upper: HBM
Lower: HBF
Bottom: NAND

Stack it 'till it's cheap :laugh:

Assimilator · Feb 13, 2025

Yeah, no. NAND flash is not RAM, it is designed for entirely different usage patterns, and the notion that it could be used as a replacement for RAM is nonsensical. Considering GPUs already have effectively direct access to storage via APIs like DirectStorage, I see no use-case for this technology.

evernessince · Feb 13, 2025

Denver said:
Correct me if I'm wrong... but AI models would rapidly degrade NAND storage, making it impractical for long-term use, since NAND ( when used as VRAM) requires continuous high-frequency read and write operations.

It depends, is the AI model being constantly loaded or can it simply stay in memory?

If they are using flash here it may be non-volatile which could make it quite flexible.

LabRat 891 · Feb 13, 2025

evernessince said:
It depends, is the AI model being constantly loaded or can it simply stay in memory?

If they are using flash here it may be non-volatile which could make it quite flexible.

Looking towards 'applications' in a given 'product',
maybe, parts of a Model can better utilize different kinds of storage?

I'm thinking:
"Working memory" Cache, HBM, RAM.
"Short-Term Memory" HBF, XLflash, phase-change memory, massively parallelized (p)SLC NAND.
"Long-Term Memory" TLC and QLC NAND.
"Archival Memory" HDDs and Magnetic Tape.

ScaLibBDP · Feb 13, 2025

Denver said:
Correct me if I'm wrong... but AI models would rapidly degrade NAND storage, making it impractical for long-term use, since NAND ( when used as VRAM) requires continuous high-frequency read and write operations.

There are two cases: Training ( write ops ) and Inference ( read ops, where they intend to use HBF ). Its overall indurance depends on Terrabyte Written ( TBW ).

Also, that new technology could affect progress of CXL Memory Expanders ( very expensive stuff right now ). 4TB inside of a GPU is a lot of memory for processing!

Wirko · Feb 14, 2025

evernessince said:
It depends, is the AI model being constantly loaded or can it simply stay in memory?

If they are using flash here it may be non-volatile which could make it quite flexible.

Non-volatility means little, or nothing, in this kind of applications. The processors will crunch vectors and matrices without interruption until they're too old and can't make enough money anymore. (Well, low power and sleep states probably exist too, since all processors can't be fully loaded all of the time.)

ScaLibBDP said:
Also, that new technology could affect progress of CXL Memory Expanders ( very expensive stuff right now ).

I don't see a close connection. CXL is PCIe, which is up to 16 lanes of Gen 6 (maybe soon in AI datacenters) or Gen 7 (a few years out). That's infinitely slower than several stacks of on-package HBM/HBF optimised for maximum bandwidth and maximum cost.

evernessince · Feb 14, 2025

Wirko said:
Non-volatility means little, or nothing, in this kind of applications. The processors will crunch vectors and matrices without interruption until they're too old and can't make enough money anymore. (Well, low power and sleep states probably exist too, since all processors can't be fully loaded all of the time.)

Non-volatile memory yields power and cost savings. There are dozens of articles on the topic: https://www.embedded.com/the-benefit-of-non-volatile-memory-nvm-for-edge-ai/

Emerging Non-Volatile Memories Enable IoT And AI Growth

MRAM and other emerging memory revenues could approach $1B and drive capital equipment revenues to $2.2B by 2033.

www.forbes.com

It allows you to take fetches that would otherwise be to main system memory or mass storage and put them right on the chip. This lowers latency and power consumption. In addition, flash doesn't needed to be constantly refreshed when not actively in use so you can very aggressively power tune it. This is simply not possible with volatile memory that needs to be refreshed to maintain data.

I believe LabRat 891 put it perfectly, it makes sense as another layer in the memory subsystem designed to hold a specific set of data and the overall workload will see a very nice benefit as a result.

dont whant to set it"' · Feb 14, 2025

It can operate at what bandwidth dear sir?Bandwidth is in its name after all.
1bit per second, arbitrary value.

InVasMani · Feb 14, 2025

If it isn't readily serviceable and replaceable the NAND seems like a serious e-waste concern for the rest of the hardware if the NAND degrades too quickly. It might be acceptable for AI depending on longevity, but probably not so much otherwise.

kondamin · Feb 14, 2025

Anther reminder it was dumb to kill off xpoint, it would have been as in demand as hbm for the last 2 years

Wirko · Feb 14, 2025

evernessince said:
Non-volatile memory yields power and cost savings. There are dozens of articles on the topic: https://www.embedded.com/the-benefit-of-non-volatile-memory-nvm-for-edge-ai/

Emerging Non-Volatile Memories Enable IoT And AI Growth

MRAM and other emerging memory revenues could approach $1B and drive capital equipment revenues to $2.2B by 2033.

www.forbes.com

It allows you to take fetches that would otherwise be to main system memory or mass storage and put them right on the chip. This lowers latency and power consumption. In addition, flash doesn't needed to be constantly refreshed when not actively in use so you can very aggressively power tune it. This is simply not possible with volatile memory that needs to be refreshed to maintain data.

I believe LabRat 891 put it perfectly, it makes sense as another layer in the memory subsystem designed to hold a specific set of data and the overall workload will see a very nice benefit as a result.

I don't disagree, NAND does have some advantages, but non-volatility by itself is not important unless and until power goes out. A theoretical volatile NAND with extremely low idle power (similar to SRAM) would do this job just as well, that's my point.

kondamin said:
Anther reminder it was dumb to kill off xpoint, it would have been as in demand as hbm for the last 2 years

We can't be sure it's dead. The development continues somewhere deep under the ground and will continue until all patents expire. TI (or whoever) may succeed in developing a method to expand those 4 layers to 100+ ... but it's not a given.

dont whant to set it' said:
It can operate at what bandwidth dear sir?Bandwidth is in its name after all.
1bit per second, arbitrary value.

HBM sends the data around at about 6400 MT/s, and NAND does it at 3200 MT/s. So, as a quick estimate, half of HBM's bandwidth would be possible with the technology we already have.

lexluthermiester · Feb 14, 2025

AleksandarK said:
SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF)

Wait, what?...

AleksandarK said:
While HBF surpasses HBM's capacity specifications, it maintains higher latency than DRAM, limiting its application to specific workloads.

So this is NAND... HBM is DRAM...

NOT an HBM killer. That was a very click-bait headline. Come on peeps, TPU is better than that crap..

AleksandarK · Feb 14, 2025

lexluthermiester said:
Wait, what?...

So this is NAND... HBM is DRAM...

NOT an HBM killer. That was a very click-bait headline. Come on peeps, TPU is better than that crap..

For AI workloads its HBM killer(despite being not the same tech fundamentally). Imagine you load an entire model on a single GPU. You don't need top-tier low-latency.

InVasMani · Feb 14, 2025

If they can somehow do this on a M.2 attached to a GPU and get similar results it would be great. If it's just replacing volatile VRAM with NAND and questionable endurance concerns maybe not as exciting. From a business standpoint it could still make a lot of sense though if the economics of it make enough sense in terms of profitability.

lexluthermiester · Feb 14, 2025

AleksandarK said:
For AI workloads its HBM killer(despite being not the same tech fundamentally). Imagine you load an entire model on a single GPU. You don't need top-tier low-latency.

While that's a fair point, I was referring to the durability factor. NAND wears out, and under these kinds of load, would wear out swiftly. This is fact and can not be argued. DRAM does not wear out.

That was my point. For that reason alone, HBF is NOT an HBM killer. Until we have a major break-through in NAND flash durability it will not change. All Sandisk has done is create mildly and temporarily useful E-waste.

Solid State Brain · Feb 14, 2025

They're already saying it's for AI inference, i.e. mostly read-centric workloads where most of the bandwidth utilization is reading model weights (in the hundreds of gigabytes to few terabytes range). Nothing prohibits hardware manufacturers from putting VRAM or HBM alongside the HBF for memory content that needs to be frequently modified (mainly the key-value cache during token generation).

lexluthermiester · Feb 14, 2025

Solid State Brain said:
They're already saying it's for AI inference, i.e. mostly read-centric workloads where most of the bandwidth utilization is reading model weights (in the hundreds of gigabytes to few terabytes range).

While that is a reasonable point, NAND flash simply doesn't have the durability to be useful long term in such a way.

Solid State Brain said:
Nothing prohibits hardware manufacturers from putting VRAM or HBM alongside the HBF for memory content that needs to be frequently modified (mainly the key-value cache during token generation).

Another reasonable point, however, that was not the claim made in the above article.

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	The Little One
Processor	i5-11320H @4.4GHZ
Motherboard	AZW SEI
Cooling	Fan w/heat pipes + side & rear vents
Memory	64GB Crucial DDR4-3200 (2x 32GB)
Video Card(s)	Iris XE
Storage	WD Black SN850X 8TB m.2, Seagate 2TB SSD + SN850 8TB x2 in an external enclosure
Display(s)	2x Samsung 43" & 2x 32"
Case	Practically identical to a mac mini, just purrtier in slate blue, & with 3x usb ports on the front !
Audio Device(s)	Yamaha ATS-1060 Bluetooth Soundbar & Subwoofer
Power Supply	65w brick
Mouse	Logitech MX Master 2
Keyboard	Logitech G613 mechanical wireless
VR HMD	Whahdatiz ???
Software	Windows 10 pro, with all the unnecessary background shitzu turned OFF !
Benchmark Scores	PDQ

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Metalia
Processor	AMD Ryzen 7 5800X3D
Motherboard	Asus TuF Gaming X570-PLUS
Cooling	ID Cooling 280mm AIO w/ Arctic P14s
Memory	2x32GB DDR4-3600
Video Card(s)	Sapphire Pulse RX 9070 XT
Storage	Optane P5801X 400GB, Samsung 990Pro 2TB
Display(s)	LG ‎32GS95UV 32" OLED 240/480hz 4K/1080P Dual Mode
Case	Geometric Future M8 Dharma
Audio Device(s)	Xonar Essence STX
Power Supply	Seasonic Focus GX-1000 Gold
Mouse	Attack Shark R3 Magnesium - White
Keyboard	Keychron K8 Pro - White - Tactile Brown Switch
Software	Windows 10 IoT Enterprise LTSC 2021

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

AleksandarK

News Editor

Denver

Wirko

AnotherReader

bonehead123

qlum

andrehide

Wirko

LabRat 891

Assimilator

evernessince

LabRat 891

ScaLibBDP

Wirko

evernessince

Emerging Non-Volatile Memories Enable IoT And AI Growth

dont whant to set it"'

InVasMani

kondamin

Wirko

Emerging Non-Volatile Memories Enable IoT And AI Growth

lexluthermiester

AleksandarK

News Editor

InVasMani

lexluthermiester

Solid State Brain

Attachments

lexluthermiester

System Name	Firelance.
Processor	Threadripper 3960X
Motherboard	ROG Strix TRX40-E Gaming
Cooling	IceGem 360 + 6x Arctic Cooling P12
Memory	8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s)	MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage	2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s)	Dell S3221QS(A) (32" 38x21 60Hz) + 2x AOC Q32E2N (32" 25x14 75Hz)
Case	Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply	Fractal Design Ion+ 2 Platinum 760W
Mouse	Logitech G604
Keyboard	Razer Pro Type Ultra
Software	Windows 10 Professional x64

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

System Name	3 "rigs"-gaming/spare pc/cruncher
Processor	R7-5800X3D/i7-7700K/R9-7950X
Motherboard	Asus ROG Crosshair VI Extreme/Asus Ranger Z170/Asus ROG Crosshair X670E-GENE
Cooling	Bitspower monoblock ,custom open loop,both passive and active/air tower cooler/air tower cooler
Memory	32GB DDR4/32GB DDR4/64GB DDR5
Video Card(s)	Gigabyte RX6900XT Alphacooled/AMD RX5700XT 50th Aniv./SOC(onboard)
Storage	mix of sata ssds/m.2 ssds/mix of sata ssds+an m.2 ssd
Display(s)	Dell UltraSharp U2410 , HP 24x
Case	mb box/Silverstone Raven RV-05/CoolerMaster Q300L
Audio Device(s)	onboard/onboard/onboard
Power Supply	3 Seasonics, a DeltaElectronics, a FractalDesing
Mouse	various/various/various
Keyboard	various wired and wireless
VR HMD	-
Software	W10.someting or another,all 3

Processor	Intel i7-12700K
Motherboard	MSI PRO Z690-A WIFI
Cooling	Noctua NH-D15S
Memory	Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s)	MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage	Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case	Fractal Define C
Power Supply	Corsair RM850x
Mouse	Logitech G203
Software	openSUSE Tumbleweed