Friday, January 17th 2025

NVIDIA GeForce RTX 50 Series "Blackwell" Features Similar L1/L2 Cache Architecture to RTX 40 Series

NVIDIA's upcoming RTX 5090 and 5080 graphics cards are maintaining similar L1 cache architectures as their predecessors while introducing marginal improvements to L2 cache capacity, according to recent specifications reported by HardwareLuxx. The flagship RTX 5090 maintains the same 128 KB L1 cache per SM as the RTX 4090 but achieves a higher total L1 cache of 21.7 MB thanks to its increased SM count of 170. This represents a notable improvement over the RTX 4090's 16.3 MB total L1 cache, which features 128 SMs. In terms of L2 cache, the RTX 5090 sees a 33.3% increase over its predecessor, boasting 96 MB compared to the RTX 4090's 72 MB, with SM count going up by 32.8%, so there is a slight difference.

However, this improvement is relatively modest compared to the previous generation's leap, where the RTX 4090 featured twelve times more L2 cache than the RTX 3090. The RTX 5080 shows more conservative improvements, with its L1 cache capacity only marginally exceeding its predecessor by 1 MB (10.7 MB vs 9.7 MB). Its L2 cache maintains parity at 64 MB, matching the RTX 4080 and 4080 Super. To compensate for these incremental cache improvements, NVIDIA is implementing faster GDDR7 memory across the RTX 50 series. Most models will feature 28 Gbps modules, with the RTX 5080 receiving special treatment with 30 Gbps memory. Additionally, some models are getting wider memory buses, with the RTX 5090 featuring a 512-bit bus and the RTX 5070 Ti upgrading to a 256-bit interface.
Sources: HardwareLuxx, via Tom's Hardware
Add your own comment

12 Comments on NVIDIA GeForce RTX 50 Series "Blackwell" Features Similar L1/L2 Cache Architecture to RTX 40 Series

#1
kondamin
I wonder why they still haven’t done what amd just did with zen5
put a slab of l3 under or above it reduce the l2 and replace it with more logic
Posted on Reply
#2
AleksandarK
News Editor
kondaminI wonder why they still haven’t done what amd just did with zen5
put a slab of l3 under or above it reduce the l2 and replace it with more logic
Yeah, I was thinking that as well. But current packaging methods can only allow lower TDP logic below cache stacks. You can't cool 600 W TDP GPU with a cache layer on top easily. NVIDIA has already looked into this (see here).
Posted on Reply
#3
N/A
in RDNA L3 level is devided between the MCD chiplets and confined to the and acessed only by the section it belongs to.
L2 sits in the center, acessed by everything and interfacing with the memory.
besides didn't 5090 have 88 MB L2 overall. where is the whitepaper showing this..
Posted on Reply
#4
starfals
5080, the biggest fail i have ever seen. The only worse card in the last 10 years gotta be 4060 or 4080 Super... Cus that Super doesnt do anything and 4060... yeah, you know why that 1 is bad. So far, Nvidia is failing gen after gen. I wonder what will happen in 2027.
Posted on Reply
#5
londiste
I have been wonder for a while - isn't Nvidia's L2 cache tied to some other architectural component? Memory controllers or ROPs would seem like a logical choice (or shader arrays) but none actually match the amounts in specs.
Posted on Reply
#7
Prima.Vera
Long gone are the times when a x070 GPU was faster, or the same raw speed of the older (x-1)090 GPU.
Now it's all about low quality upscaling and fake frames generating... For double the price.
You gotta love those callous and greedy megacorporations.
Posted on Reply
#8
tpuuser256
AleksandarKYeah, I was thinking that as well. But current packaging methods can only allow lower TDP logic below cache stacks. You can't cool 600 W TDP GPU with a cache layer on top easily. NVIDIA has already looked into this (see here).
Iirc the 9000x3d series is using cache below the logic, allow the heat to be sucked out without going through the cache. The same could be used for NVIDIA GPUs, they are just holding back because there is no real need for more performance/competitiveness currently.
They are holding back a lot actually
Posted on Reply
#9
TheinsanegamerN
tpuuser256Iirc the 9000x3d series is using cache below the logic, allow the heat to be sucked out without going through the cache. The same could be used for NVIDIA GPUs, they are just holding back because there is no real need for more performance/competitiveness currently.
They are holding back a lot actually
AMD's method also doesnt scale very well with larger dies, which is why we havent seen an APU with the big iGPU using x3d cache yet. The bigger the die, apparently the harder it is to pull off without breaking something.

Now scale that to the monster that is the 5090.
Posted on Reply
#10
TechBuyingHavoc
DavenBalckwell is not going to be much faster than Ada unless a game uses a lot of RT, AI tech and you have DLSS enabled.

videocardz.com/newz/geforce-rtx-5090d-reviewer-says-this-generation-hardware-improvements-arent-massive

videocardz.com/newz/nvidia-geforce-rtx-5090-appears-in-first-geekbench-opencl-vulkan-leaks

I’m going to stick to the 1000’s of games of yesteryear that I still haven’t played. I can run them at 4K/Ultra/120 fps with my 7900XT.
The 7900XT is going to age very nicely, tons of rasterization performance, plenty of VRAM, and solid drivers, all at 300W power usage. As for games, I am plenty busy slowly churning through BG3 and Helldivers 2 alone.
Posted on Reply
#11
Punkenjoy
N/Ain RDNA L3 level is devided between the MCD chiplets and confined to the and acessed only by the section it belongs to.
L2 sits in the center, acessed by everything and interfacing with the memory.
besides didn't 5090 have 88 MB L2 overall. where is the whitepaper showing this..
The MALL on RDNA3 (Memory Attached Last Level cache) aka Infinity Cache is indeed tied to each memory region that its connected to. But that is nothing new. This was the same thing as RDNA2 and RDNA 3.5.

If RDNA use MALL cache, that would also be the case. The thing is it's not a big deal as you seem to say. Firstly cache don't cache data but memory line. The data would be naturally spread across all memory controller in order to benefits from the whole bandwidth available. That would also mean the cache load would be spread accross all of those chiplets naturally.
Posted on Reply
#12
N/A
PunkenjoyThe MALL on RDNA3 (Memory Attached Last Level cache) aka Infinity Cache is indeed tied to each memory region that its connected to. But that is nothing new. This was the same thing as RDNA2 and RDNA 3.5.

If RDNA use MALL cache, that would also be the case. The thing is it's not a big deal as you seem to say. Firstly cache don't cache data but memory line.
You're right it's continuous, and the memory line that could benefit is mostly the front and back buffer which in the case of 4K is 64 MB so anything less than 64 Mb included, 5070 Ti could be out of luck here as I can imagine.
Posted on Reply
Add your own comment
Jan 17th, 2025 15:42 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts