• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce RTX 50 Series "Blackwell" Features Similar L1/L2 Cache Architecture to RTX 40 Series

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,723 (1.01/day)
NVIDIA's upcoming RTX 5090 and 5080 graphics cards are maintaining similar L1 cache architectures as their predecessors while introducing marginal improvements to L2 cache capacity, according to recent specifications reported by HardwareLuxx. The flagship RTX 5090 maintains the same 128 KB L1 cache per SM as the RTX 4090 but achieves a higher total L1 cache of 21.7 MB thanks to its increased SM count of 170. This represents a notable improvement over the RTX 4090's 16.3 MB total L1 cache, which features 128 SMs. In terms of L2 cache, the RTX 5090 sees a 33.3% increase over its predecessor, boasting 96 MB compared to the RTX 4090's 72 MB, with SM count going up by 32.8%, so there is a slight difference.

However, this improvement is relatively modest compared to the previous generation's leap, where the RTX 4090 featured twelve times more L2 cache than the RTX 3090. The RTX 5080 shows more conservative improvements, with its L1 cache capacity only marginally exceeding its predecessor by 1 MB (10.7 MB vs 9.7 MB). Its L2 cache maintains parity at 64 MB, matching the RTX 4080 and 4080 Super. To compensate for these incremental cache improvements, NVIDIA is implementing faster GDDR7 memory across the RTX 50 series. Most models will feature 28 Gbps modules, with the RTX 5080 receiving special treatment with 30 Gbps memory. Additionally, some models are getting wider memory buses, with the RTX 5090 featuring a 512-bit bus and the RTX 5070 Ti upgrading to a 256-bit interface.



View at TechPowerUp Main Site | Source
 
Joined
Jan 11, 2022
Messages
997 (0.90/day)
I wonder why they still haven’t done what amd just did with zen5
put a slab of l3 under or above it reduce the l2 and replace it with more logic
 

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,723 (1.01/day)
I wonder why they still haven’t done what amd just did with zen5
put a slab of l3 under or above it reduce the l2 and replace it with more logic
Yeah, I was thinking that as well. But current packaging methods can only allow lower TDP logic below cache stacks. You can't cool 600 W TDP GPU with a cache layer on top easily. NVIDIA has already looked into this (see here).
 
Joined
Dec 31, 2020
Messages
1,039 (0.70/day)
Processor E5-4627 v4
Motherboard VEINEDA X99
Memory 32 GB
Video Card(s) 2080 Ti
Storage NE-512
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
in RDNA L3 level is devided between the MCD chiplets and confined to the and acessed only by the section it belongs to.
L2 sits in the center, acessed by everything and interfacing with the memory.
besides didn't 5090 have 88 MB L2 overall. where is the whitepaper showing this..
 
Joined
Mar 5, 2024
Messages
126 (0.39/day)
5080, the biggest fail i have ever seen. The only worse card in the last 10 years gotta be 4060 or 4080 Super... Cus that Super doesnt do anything and 4060... yeah, you know why that 1 is bad. So far, Nvidia is failing gen after gen. I wonder what will happen in 2027.
 
Joined
Feb 3, 2017
Messages
3,862 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
I have been wonder for a while - isn't Nvidia's L2 cache tied to some other architectural component? Memory controllers or ROPs would seem like a logical choice (or shader arrays) but none actually match the amounts in specs.
 
Joined
Dec 12, 2016
Messages
2,082 (0.70/day)
Balckwell is not going to be much faster than Ada unless a game uses a lot of RT, AI tech and you have DLSS enabled.



I’m going to stick to the 1000’s of games of yesteryear that I still haven’t played. I can run them at 4K/Ultra/120 fps with my 7900XT.
 
Joined
Sep 15, 2011
Messages
6,825 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Long gone are the times when a x070 GPU was faster, or the same raw speed of the older (x-1)090 GPU.
Now it's all about low quality upscaling and fake frames generating... For double the price.
You gotta love those callous and greedy megacorporations.
 
Joined
Mar 11, 2024
Messages
90 (0.29/day)
Yeah, I was thinking that as well. But current packaging methods can only allow lower TDP logic below cache stacks. You can't cool 600 W TDP GPU with a cache layer on top easily. NVIDIA has already looked into this (see here).
Iirc the 9000x3d series is using cache below the logic, allow the heat to be sucked out without going through the cache. The same could be used for NVIDIA GPUs, they are just holding back because there is no real need for more performance/competitiveness currently.
They are holding back a lot actually
 
Joined
Dec 28, 2012
Messages
4,062 (0.92/day)
System Name Skunkworks 3.0
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software Manjaro
Iirc the 9000x3d series is using cache below the logic, allow the heat to be sucked out without going through the cache. The same could be used for NVIDIA GPUs, they are just holding back because there is no real need for more performance/competitiveness currently.
They are holding back a lot actually
AMD's method also doesnt scale very well with larger dies, which is why we havent seen an APU with the big iGPU using x3d cache yet. The bigger the die, apparently the harder it is to pull off without breaking something.

Now scale that to the monster that is the 5090.
 
Joined
Oct 5, 2024
Messages
159 (1.51/day)
Location
United States of America
Balckwell is not going to be much faster than Ada unless a game uses a lot of RT, AI tech and you have DLSS enabled.



I’m going to stick to the 1000’s of games of yesteryear that I still haven’t played. I can run them at 4K/Ultra/120 fps with my 7900XT.
The 7900XT is going to age very nicely, tons of rasterization performance, plenty of VRAM, and solid drivers, all at 300W power usage. As for games, I am plenty busy slowly churning through BG3 and Helldivers 2 alone.
 
Joined
Oct 12, 2005
Messages
724 (0.10/day)
in RDNA L3 level is devided between the MCD chiplets and confined to the and acessed only by the section it belongs to.
L2 sits in the center, acessed by everything and interfacing with the memory.
besides didn't 5090 have 88 MB L2 overall. where is the whitepaper showing this..
The MALL on RDNA3 (Memory Attached Last Level cache) aka Infinity Cache is indeed tied to each memory region that its connected to. But that is nothing new. This was the same thing as RDNA2 and RDNA 3.5.

If RDNA use MALL cache, that would also be the case. The thing is it's not a big deal as you seem to say. Firstly cache don't cache data but memory line. The data would be naturally spread across all memory controller in order to benefits from the whole bandwidth available. That would also mean the cache load would be spread accross all of those chiplets naturally.
 
  • Like
Reactions: N/A
Joined
Dec 31, 2020
Messages
1,039 (0.70/day)
Processor E5-4627 v4
Motherboard VEINEDA X99
Memory 32 GB
Video Card(s) 2080 Ti
Storage NE-512
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
The MALL on RDNA3 (Memory Attached Last Level cache) aka Infinity Cache is indeed tied to each memory region that its connected to. But that is nothing new. This was the same thing as RDNA2 and RDNA 3.5.

If RDNA use MALL cache, that would also be the case. The thing is it's not a big deal as you seem to say. Firstly cache don't cache data but memory line.
You're right it's continuous, and the memory line that could benefit is mostly the front and back buffer which in the case of 4K is 64 MB so anything less than 64 Mb included, 5070 Ti could be out of luck here as I can imagine.
 
Top