Tuesday, December 17th 2024

NVIDIA GeForce RTX 5080 to Stand Out with 30 Gbps GDDR7 Memory, Other SKUs Remain on 28 Gbps
NVIDIA is preparing to unveil its "Blackwell" GeForce RTX 5080 graphics card, featuring cutting-edge GDDR7 memory technology. However, RTX 5080 is expected to be equipped with 16 GB of GDDR7 memory running at an impressive 30 Gbps. Combined with a 256-bit memory bus, this configuration will deliver approximately 960 GB/s bandwidth—a 34% improvement over its predecessor, the RTX 4080, which operates at 716.8 GB/s. While the RTX 5080 will stand as the sole card in the lineup featuring 30 Gbps memory modules, while other models in the RTX 50 series will incorporate slightly slower 28 Gbps variants. This strategic differentiation is possibly due to the massive CUDA cores gap between the rumored RTX 5080 and RTX 5090.
The flagship RTX 5090 is set to push boundaries even further, implementing a wider 512-bit memory bus that could potentially achieve bandwidth exceeding 1.7 TB/s. NVIDIA appears to be reserving larger memory configurations of 16 GB+ exclusively for this top-tier model, at least until higher-capacity GDDR7 modules become available in the market. Despite these impressive specifications, the RTX 5080's bandwidth still falls approximately 5% short of the current RTX 4090, which benefits from a physically wider bus configuration. This performance gap between the 5080 and the anticipated 5090 suggests NVIDIA is maintaining a clear hierarchy within its product stack, and we have to wait for the final launch to conclude what, how, and why of the Blackwell gaming GPUs.
Sources:
Benchlife, via VideoCardz
The flagship RTX 5090 is set to push boundaries even further, implementing a wider 512-bit memory bus that could potentially achieve bandwidth exceeding 1.7 TB/s. NVIDIA appears to be reserving larger memory configurations of 16 GB+ exclusively for this top-tier model, at least until higher-capacity GDDR7 modules become available in the market. Despite these impressive specifications, the RTX 5080's bandwidth still falls approximately 5% short of the current RTX 4090, which benefits from a physically wider bus configuration. This performance gap between the 5080 and the anticipated 5090 suggests NVIDIA is maintaining a clear hierarchy within its product stack, and we have to wait for the final launch to conclude what, how, and why of the Blackwell gaming GPUs.
44 Comments on NVIDIA GeForce RTX 5080 to Stand Out with 30 Gbps GDDR7 Memory, Other SKUs Remain on 28 Gbps
It is extremely obvious thst in the near future a better variant with 20 or 24GB of VRAM, and probably way wider bus, is going to be released. So this card will be easily forgotten.
24 Gbps memory (according to review), which is downclocked to 22.4. Same with 4080 S, but downclocked to 23Gbps, guess it simply doesn't need more. The L2 "infinity cache" does outstanding work.
Interesting choice though to actually have faster rated memory but run at lower speed... I do wonder what that is about.
It's a balanced configuration, however. I'll give you that much. All hardware resources generally end up fully utilized.
If you send me a 4080, I will gladly run some tests :D
Edit:
As Dr. Dro pointed out - gaming tests show that 4080 loses performance at higher resolutions faster than otherwise comparable or comparable-ish GPUs that have wider memory bus and more memory bandwidth. At smaller resolutions the bigger cache on Ada can mask the relatively lower bandwidth but it is simply not sufficient any more at higher resolutions.
- card has G6X 24 Gbps. Because the card already has more than enough *effective bandwidth* with the big L2 cache, they downclocked it to 22.4 / 23 (4080 S) to save energy. Go and check performance reviews, it doesn't lose "extra" in 4K because the bandwidth is sufficient. Go check 7900 XT, it loses too much in 4K. -> the infinity cache of Nvidia works better.
Sorry can't do, i only have the bigger one
It's a new architecture, and we know nothing about its performance characteristics. But judging from previous launches, it's reasonable to assume both memory compression and management are improved. And as usual, a lot of you will probably end up buying one in the end despite joining the "politically correct" whining about VRAM size, bits of memory bus or core count, when the only thing that really matters is how it performs in the real world.
Let's instead look forward to a brand new generation full of goodies. :)
Also PCIE V5.0 will reduce stutters by 50%.
Let's assume card's memory is 8GB and dataset is 9GB. So 1GB constantly moved through PCIE V4 or V5.
PCIE V4: 1GB goes 30 milliseconds. 33 FPS
PCIE V5: 1GB goes 15 milliseconds. 67 FPS.
Let's assume card's memory is 8GB and dataset is 10GB: 2GB is moved.
PCIE V4: 2GB goes 60 milliseconds. 16 FPS
PCIE V5: 2GB goes 30 milliseconds. 33 FPS.
33 FPS is manageable imho. (I played CS at 10 FPS for years...)
But let's assume that you have 9GB of assets to utilize during a single frame with only 8GB of VRAM, that will be a sad story, no question about that. But that's not how rendering engines work though;
Firstly, textures are stored as multiple mip levels and uses anisotropic filtering, so within the allocated memory there are dozens of versions of the "same" texture, and the GPU isn't going to be using all of that data during a single frame. So even if your dataset is 9 GB (even with dynamic loading of assets), only about ~1-1.5 GB of that will be used during any given frame. (Also the highest mip levels will only be used by objects close to the camera, and only so many object can ever be close at once, so for most of the scene very low mip levels are usually used.)
Secondly, the memory bandwidth will be a bottleneck long before you run into heavy swapping anyways. Take for instance RTX 4060 with its 272 GB/s, if you target 60 FPS then you could theoretically only ever access a total of 4.5 GB during 16.7 ms, half of that if you target 120 FPS. Additionally, accesses will not be evenly distributed throughout the frametime (i.e. vertex shaders and tessellation are much less memory intensive), so the real world peak would probably be less than half of that. As you see, by the time you even come close to the VRAM limit of a card, you are already approaching "slide show" territory at ~15 FPS. So any reasoning for having lots of VRAM for "future proofing" is a moot point, not to mention that future games will be even more computationally intensive.
If you do run into heavy swapping, you'll know, trust me, it would be quckly unplayable. But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.
Gpu certainly has enough compute power to accelerate fluid simulation to 60 FPS (especially with compressed L2 cache) but hindered by pcie bandwidth. For example, RTX4070 memory is 500GB/s, L2 is about 1.5TB/s but pcie only 20-30 GB/s because its version 4, not 5.
For example, game of life accesses closest 8 cells. Fluid mechanics can reach maybe 30 or more cells per cell. Nbody algorithm accesses all particles per particle when in brute force. This is extremely cache sensitive. When code updates cell data, L2 stores it compressed. So 40MB cache acts like 160MB. But doesnt affect main memory.
I wish rtx5080 comes with 200MB cache and 2x more compression so that up to 1.6GB redundancy can stay in gpu chip without touching vram. This would be like 8GB + 1.6GB. Add pcie v5.0 bandwidth too. Perfect combo to counter some stutters.
In cuda, there is an option to keep a selected region of data in L2 between different function calls or kernels.
Generally yes, I have seen VRAM overclocking (or any other bandwidth improvement) helping out a lot but additional VRAM only helps in outlier games with some mega shady rendering going on. Or when we're talking additional over seriously obsolete (which is ~5 GB in this day and age) amount thereof.
I don't think 30 GT/s is remotely enough to release the 5080's full potential though. More like 45 GT/s or so, unless that cache somehow someway works 2+ times better than in Ada.
A game engine is designed more like a real time system, where algorithms aren't chosen primarily based on O-notation (like in academia), but for achieving consistent performance by awareness of access patterns, cache optimization and avoiding very costly system calls etc. in critical paths. This holds true for both rendering and for game loops; terrible worst-case would result in stutter while rendering, but it's even worse in a game loop, such problems are the cause of many bugs and glitches in modern games. All performant code must be developed with some kind of assessment of the underlying hardware characteristics, and cache optimization is one of the most important. I could go much deeper, but hopefully you get the point. ;) Compression rates varies based on data density; sparse data can be very heavily compressed, while more random data is not. If your dataset is fairly reliable, then you can probably make an assessment of effective compression rate for your use case, but it wouldn't be applicable to anything else.
I'm not aware of the details of the upcoming generation, but generally speaking it's usually a hint better than the previous. Since when did we expect recent games to run well on 9 year old hardware that isn't even supported any more? Not that more VRAM would have saved it, at best you'd have a pretty slide show, as that GPU wouldn't be powerful enough or have enough bandwidth to pump out 60 FPS at good settings.
Wddm does its own grouping for multiple kernels so it sometimes cant even overlap data copy & compute. Ubuntu is better than windows 11 in this part.