NVIDIA GeForce RTX 5080 to Stand Out with 30 Gbps GDDR7 Memory, Other SKUs Remain on 28 Gbps

Dr. Dro · Dec 17, 2024

Cheeseball said:
That 28 to 30 Gbps increase is the GDDR7 memory between the RTX 5080 (and lower SKUs) compared to the RTX 5090.

The RTX 4090 has 21 Gbps GDDR6X memory and the 7900 XTX has 20 Gbps GDDR6. That bandwidth increase is pretty substantial aside from the bus increase. I think the RTX 4080 SUPER had 23 Gbps GDDR6X chips.

24 Gbps on both 4080 and 4080 Super. 21 on 3090 Ti and other G6X equipped 40 series (4070, 70 Ti, 70 Ti Super, 4090). 1st generation 21 Gbps chip with half density in 3090, 3080 and 3080 Ti use a 19 Gbps IC.

Guwapo77 said:
Will there be a Ti without any competition?

Super versions might come to appease shareholders in 2026.

Prima.Vera · Dec 18, 2024

In a normal world, where user would be much more considerate with their hard earn money, this card should be DOA, unless is priced as the current 4070, or less. By the specs, this is a 5070 on all, except name.
It is extremely obvious thst in the near future a better variant with 20 or 24GB of VRAM, and probably way wider bus, is going to be released. So this card will be easily forgotten.

AcE · Dec 18, 2024

londiste said:
GDDR7 uses PAM3 signalling vs GDDR6X's PAM4. GDDR7 has lower voltages as well.

Yes, GDDR7 is to G6X what GDDR6 was to G5X - a more refined version, or "final" version and AMD will use it as well, long time. GDDR6 was announced by Samsung for up to 24 Gbps speed's btw, not making it really inferior to G6X, but I never saw those versions on any graphics cards.

londiste · Dec 18, 2024

Dr. Dro said:
24 Gbps on both 4080 and 4080 Super.

Did these actually ever get those? Specs were for 22 and 23 IIRC.

NoneRain · Dec 18, 2024

Can't wait for the totally reasonable prices.

AcE · Dec 18, 2024

londiste said:
Did these actually ever get those? Specs were for 22 and 23 IIRC.

NVIDIA GeForce RTX 4080 Specs

NVIDIA AD103, 2505 MHz, 9728 Cores, 304 TMUs, 112 ROPs, 16384 MB GDDR6X, 1400 MHz, 256 bit

www.techpowerup.com

24 Gbps memory (according to review), which is downclocked to 22.4. Same with 4080 S, but downclocked to 23Gbps, guess it simply doesn't need more. The L2 "infinity cache" does outstanding work.

londiste · Dec 18, 2024

AcE said:
24 Gbps memory (according to review), which is downclocked to 22.4. Same with 4080 S, but downclocked to 23Gbps, guess it simply doesn't need more. The L2 "infinity cache" does outstanding work.

I very very much doubt the part I bolded

Interesting choice though to actually have faster rated memory but run at lower speed... I do wonder what that is about.

AcE · Dec 18, 2024

londiste said:
I very very much doubt the part I bolded

Then post data to support your claim.

Very much doubt you have a point here. Get a 4080 and OC its vram, get about +0% performance. ^^ It has more than enough bandwidth.

Dr. Dro · Dec 18, 2024

londiste said:
Did these actually ever get those? Specs were for 22 and 23 IIRC.

Yes, both of them have the exact same 24 Gbps IC, Micron MT61K512M32KPA-24 (D8BZF). It comes clocked in at 38 MHz faster in the 4080 Super out of the box, but the limits on both are relatively identical, as is the IC. The "normal" 4080 just has some extra headroom relative to its slightly slower stock speed as a result.

AcE said:
Then post data to support your claim. Very much doubt you have a point here. Get a 4080 and OC it's vram, get about +0% performance. ^^ It has more than enough bandwidth.

It actually does not have enough bandwidth, IMO. The 256-bit bus hampers the card really bad at high resolutions. The 4080 loses a ton of gas at 4K in games that are bandwidth heavy. Performance falls off much faster than it does with say, the 7900 XTX with its ~960 GB/s bandwidth.

It's a balanced configuration, however. I'll give you that much. All hardware resources generally end up fully utilized.

londiste · Dec 18, 2024

AcE said:
Then post data to support your claim. Very much doubt you have a point here. Get a 4080 and OC its vram, get about +0% performance. ^^ It has more than enough bandwidth.

Are you actually trying to argue that faster VRAM speed in not beneficial to a GPU?
If you send me a 4080, I will gladly run some tests

Edit:
As Dr. Dro pointed out - gaming tests show that 4080 loses performance at higher resolutions faster than otherwise comparable or comparable-ish GPUs that have wider memory bus and more memory bandwidth. At smaller resolutions the bigger cache on Ada can mask the relatively lower bandwidth but it is simply not sufficient any more at higher resolutions.

AcE · Dec 18, 2024

londiste said:
Are you actually trying to argue that faster VRAM speed in not beneficial to a GPU?
If you send me a 4080, I will gladly run some tests

My argument is this:
- card has G6X 24 Gbps. Because the card already has more than enough *effective bandwidth* with the big L2 cache, they downclocked it to 22.4 / 23 (4080 S) to save energy. Go and check performance reviews, it doesn't lose "extra" in 4K because the bandwidth is sufficient. Go check 7900 XT, it loses too much in 4K. -> the infinity cache of Nvidia works better.

Sorry can't do, i only have the bigger one

efikkan · Dec 18, 2024

john_ said:
It's funny. Nvidia users ask for more VRAM and Nvidia is giving them faster VRAM.
It's like a joke with deaf people.

And most people don't understand how VRAM even works. And while more VRAM doesn't increase performance directly, faster VRAM usually does.

It's a new architecture, and we know nothing about its performance characteristics. But judging from previous launches, it's reasonable to assume both memory compression and management are improved. And as usual, a lot of you will probably end up buying one in the end despite joining the "politically correct" whining about VRAM size, bits of memory bus or core count, when the only thing that really matters is how it performs in the real world.

Let's instead look forward to a brand new generation full of goodies.

tugrul_LordOfDrinks · Dec 20, 2024

Maybe they upscale the textures with AI so the 8GB VRAM is enough.

Also PCIE V5.0 will reduce stutters by 50%.

Let's assume card's memory is 8GB and dataset is 9GB. So 1GB constantly moved through PCIE V4 or V5.

PCIE V4: 1GB goes 30 milliseconds. 33 FPS

PCIE V5: 1GB goes 15 milliseconds. 67 FPS.

Let's assume card's memory is 8GB and dataset is 10GB: 2GB is moved.

PCIE V4: 2GB goes 60 milliseconds. 16 FPS

PCIE V5: 2GB goes 30 milliseconds. 33 FPS.

33 FPS is manageable imho. (I played CS at 10 FPS for years...)

efikkan · Dec 20, 2024

tugrul_SIMD said:
Also PCIE V5.0 will reduce stutters by 50%.

Let's assume card's memory is 8GB and dataset is 9GB. So 1GB constantly moved through PCIE V4 or V5.

If by dataset out mean textures and meshes only, then there would be a lot of large buffers additionally, most of which are heavily compressed though, so are some textures too. (Total allocated VRAM would probably be 12+ GB then.)

But let's assume that you have 9GB of assets to utilize during a single frame with only 8GB of VRAM, that will be a sad story, no question about that. But that's not how rendering engines work though;

Firstly, textures are stored as multiple mip levels and uses anisotropic filtering, so within the allocated memory there are dozens of versions of the "same" texture, and the GPU isn't going to be using all of that data during a single frame. So even if your dataset is 9 GB (even with dynamic loading of assets), only about ~1-1.5 GB of that will be used during any given frame. (Also the highest mip levels will only be used by objects close to the camera, and only so many object can ever be close at once, so for most of the scene very low mip levels are usually used.)

Secondly, the memory bandwidth will be a bottleneck long before you run into heavy swapping anyways. Take for instance RTX 4060 with its 272 GB/s, if you target 60 FPS then you could theoretically only ever access a total of 4.5 GB during 16.7 ms, half of that if you target 120 FPS. Additionally, accesses will not be evenly distributed throughout the frametime (i.e. vertex shaders and tessellation are much less memory intensive), so the real world peak would probably be less than half of that. As you see, by the time you even come close to the VRAM limit of a card, you are already approaching "slide show" territory at ~15 FPS. So any reasoning for having lots of VRAM for "future proofing" is a moot point, not to mention that future games will be even more computationally intensive.

If you do run into heavy swapping, you'll know, trust me, it would be quckly unplayable. But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.

tugrul_LordOfDrinks · Dec 30, 2024

efikkan said:
If by dataset out mean textures and meshes only, then there would be a lot of large buffers additionally, most of which are heavily compressed though, so are some textures too. (Total allocated VRAM would probably be 12+ GB then.)

But let's assume that you have 9GB of assets to utilize during a single frame with only 8GB of VRAM, that will be a sad story, no question about that. But that's not how rendering engines work though;

Firstly, textures are stored as multiple mip levels and uses anisotropic filtering, so within the allocated memory there are dozens of versions of the "same" texture, and the GPU isn't going to be using all of that data during a single frame. So even if your dataset is 9 GB (even with dynamic loading of assets), only about ~1-1.5 GB of that will be used during any given frame. (Also the highest mip levels will only be used by objects close to the camera, and only so many object can ever be close at once, so for most of the scene very low mip levels are usually used.)

Secondly, the memory bandwidth will be a bottleneck long before you run into heavy swapping anyways. Take for instance RTX 4060 with its 272 GB/s, if you target 60 FPS then you could theoretically only ever access a total of 4.5 GB during 16.7 ms, half of that if you target 120 FPS. Additionally, accesses will not be evenly distributed throughout the frametime (i.e. vertex shaders and tessellation are much less memory intensive), so the real world peak would probably be less than half of that. As you see, by the time you even come close to the VRAM limit of a card, you are already approaching "slide show" territory at ~15 FPS. So any reasoning for having lots of VRAM for "future proofing" is a moot point, not to mention that future games will be even more computationally intensive.

If you do run into heavy swapping, you'll know, trust me, it would be quckly unplayable. But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.

By dataset size I meant assets that are always accesed for each frame. For example it can be a physx simulation with 1024x1024x1024 cells of space for fluid with 8 bytes per cell making 8GB always accessed each frame, plus some textures and stuff to render the objects in there maybe 1 GB more on top of fully accessed 8GB. Then the 1 extra GB goes through pcie for RAM.

Gpu certainly has enough compute power to accelerate fluid simulation to 60 FPS (especially with compressed L2 cache) but hindered by pcie bandwidth. For example, RTX4070 memory is 500GB/s, L2 is about 1.5TB/s but pcie only 20-30 GB/s because its version 4, not 5.

efikkan · Dec 30, 2024

tugrul_SIMD said:
By dataset size I meant assets that are always accesed for each frame. For example it can be a physx simulation with 1024x1024x1024 cells of space for fluid with 8 bytes per cell making 8GB always accessed each frame, plus some textures and stuff to render the objects in there maybe 1 GB more on top of fully accessed 8GB. Then the 1 extra GB goes through pcie for RAM.

No game that I know of would reserve that much for a fluid simulation vs. other assets, not to mention a game wouldn't simulate a single large volume like that, but rather have detailed simulation around objects, and for most games there are only a very basic simulation of the surface needed. A large volume like that will also be highly compressible.

tugrul_SIMD said:
Gpu certainly has enough compute power to accelerate fluid simulation to 60 FPS (especially with compressed L2 cache) but hindered by pcie bandwidth. For example, RTX4070 memory is 500GB/s, L2 is about 1.5TB/s but pcie only 20-30 GB/s because its version 4, not 5.

You misunderstand the role of caches; it's actually a streaming buffer. Over time, the average throughput will converge towards the bandwidth of the memory, no matter how fast the cache may be. One of the crucial jobs of the L2 cache in GPUs is to compensate for uneven load across memory controllers, yes, there are usually more than one. Each 64-bit of memory bandwidth is connected to a separate controller, and whenever the data requested "overloads" one or more controllers, prefetching to a larger cache will help offset this and prevent unnecessary stalls of the GPU. (Back when AMD first added large L2 caches to their GPUs they saw large gains in some cases, as AMD's GPUs have struggled with poor resource management for many generations.) L2-Caches (both in GPUs and CPUs) have usually very large bandwidth, not because you will ever come close to that on average, but because you want the peak speed for each cache bank when you actually need it, to get the maximum savings from having the cache.

tugrul_LordOfDrinks · Dec 30, 2024

efikkan said:
No game that I know of would reserve that much for a fluid simulation vs. other assets, not to mention a game wouldn't simulate a single large volume like that, but rather have detailed simulation around objects, and for most games there are only a very basic simulation of the surface needed. A large volume like that will also be highly compressible.

You misunderstand the role of caches; it's actually a streaming buffer. Over time, the average throughput will converge towards the bandwidth of the memory, no matter how fast the cache may be. One of the crucial jobs of the L2 cache in GPUs is to compensate for uneven load across memory controllers, yes, there are usually more than one. Each 64-bit of memory bandwidth is connected to a separate controller, and whenever the data requested "overloads" one or more controllers, prefetching to a larger cache will help offset this and prevent unnecessary stalls of the GPU. (Back when AMD first added large L2 caches to their GPUs they saw large gains in some cases, as AMD's GPUs have struggled with poor resource management for many generations.) L2-Caches (both in GPUs and CPUs) have usually very large bandwidth, not because you will ever come close to that on average, but because you want the peak speed for each cache bank when you actually need it, to get the maximum savings from having the cache.

Fluid simulation and generally particle simulations access neighbor particles for each particle. This is a redundancy and good for L2. Also compressible memory feature of cuda makes L2 compressed.

For example, game of life accesses closest 8 cells. Fluid mechanics can reach maybe 30 or more cells per cell. Nbody algorithm accesses all particles per particle when in brute force. This is extremely cache sensitive. When code updates cell data, L2 stores it compressed. So 40MB cache acts like 160MB. But doesnt affect main memory.

I wish rtx5080 comes with 200MB cache and 2x more compression so that up to 1.6GB redundancy can stay in gpu chip without touching vram. This would be like 8GB + 1.6GB. Add pcie v5.0 bandwidth too. Perfect combo to counter some stutters.

In cuda, there is an option to keep a selected region of data in L2 between different function calls or kernels.

Macro Device · Dec 30, 2024

efikkan said:
But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.

Or trying to play Cyberpunk-esque games on an ancient 2 GB GPU like R9 380 also having 8 GB system RAM. With textures at Medium, it stutters like crazy and crashes with them on High. //please don't ask me why I know it

Generally yes, I have seen VRAM overclocking (or any other bandwidth improvement) helping out a lot but additional VRAM only helps in outlier games with some mega shady rendering going on. Or when we're talking additional over seriously obsolete (which is ~5 GB in this day and age) amount thereof.

I don't think 30 GT/s is remotely enough to release the 5080's full potential though. More like 45 GT/s or so, unless that cache somehow someway works 2+ times better than in Ada.

efikkan · Dec 30, 2024

tugrul_SIMD said:
Fluid simulation and generally particle simulations access neighbor particles for each particle. This is a redundancy and good for L2. Also compressible memory feature of cuda makes L2 compressed.

Pretty much anything the GPU is good at, is done on many instances in parallel, which is why memory accesses on GPU are in larger blocks.

tugrul_SIMD said:
For example, game of life accesses closest 8 cells. Fluid mechanics can reach maybe 30 or more cells per cell. Nbody algorithm accesses all particles per particle when in brute force. This is extremely cache sensitive.

The amount of simulation that a rendering engine can do within a strict deadline is vastly different from what you can do with a simulation without such constraints (even on the same class of hardware). So whether it's for academic purposes or in a professional setting creating a non-realtime simulation is very different from a game.

A game engine is designed more like a real time system, where algorithms aren't chosen primarily based on O-notation (like in academia), but for achieving consistent performance by awareness of access patterns, cache optimization and avoiding very costly system calls etc. in critical paths. This holds true for both rendering and for game loops; terrible worst-case would result in stutter while rendering, but it's even worse in a game loop, such problems are the cause of many bugs and glitches in modern games. All performant code must be developed with some kind of assessment of the underlying hardware characteristics, and cache optimization is one of the most important. I could go much deeper, but hopefully you get the point.

tugrul_SIMD said:
When code updates cell data, L2 stores it compressed. So 40MB cache acts like 160MB. But doesnt affect main memory.
I wish rtx5080 comes with 200MB cache and 2x more compression so that up to 1.6GB redundancy can stay in gpu chip without touching vram.

Compression rates varies based on data density; sparse data can be very heavily compressed, while more random data is not. If your dataset is fairly reliable, then you can probably make an assessment of effective compression rate for your use case, but it wouldn't be applicable to anything else.
I'm not aware of the details of the upcoming generation, but generally speaking it's usually a hint better than the previous.

Macro Device said:
Or trying to play Cyberpunk-esque games on an ancient 2 GB GPU like R9 380 also having 8 GB system RAM. With textures at Medium, it stutters like crazy and crashes with them on High. //please don't ask me why I know it

Since when did we expect recent games to run well on 9 year old hardware that isn't even supported any more? Not that more VRAM would have saved it, at best you'd have a pretty slide show, as that GPU wouldn't be powerful enough or have enough bandwidth to pump out 60 FPS at good settings.

tugrul_LordOfDrinks · Dec 30, 2024

Even when a kernel function does exact same amount of computation for each run, the time taken can still vary between 1x - 3x due to driver or OS. Cuda needs TCC driver mode to reduce the fluctuation in windows. Sadly, rtx/gtx cant do tcc mode. But even the low low low end k420 can.

Wddm does its own grouping for multiple kernels so it sometimes cant even overlap data copy & compute. Ubuntu is better than windows 11 in this part.

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	nVAMDia
Processor	Ryzen 7900 at 145Watts PPT.
Motherboard	MSI MAG x670 Tomahawk Wifi
Cooling	AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory	32GB dual channel Gskill DDR6000CL30
Video Card(s)	Zotac 5070 gaming solid oc + Msi Ventus 2x Rtx 4070
Storage	Samsung Evo 970
Display(s)	Old 1080p 60FPS Samsung
Case	Normal atx
Audio Device(s)	Dunno
Power Supply	1200Watts (8x 8-pin)
Mouse	wireless & quiet
Keyboard	wireless & quiet
VR HMD	No
Software	Windows 11
Benchmark Scores	The best GPU in the world is the brain. 100!

System Name	D.L.S.S. (Die Lekker Spoed Situasie)
Processor	i5-12400F
Motherboard	Gigabyte B760M DS3H
Cooling	Laminar RM1
Memory	32 GB DDR4-3200
Video Card(s)	RX 6700 XT (vandalised)
Storage	Yes.
Display(s)	MSi G2712
Case	Matrexx 55 (slightly vandalised)
Audio Device(s)	Yes.
Power Supply	Thermaltake 1000 W
Mouse	Don't disturb, cheese eating in progress...
Keyboard	Makes some noise. Probably onto something.
VR HMD	I live in real reality and don't need a virtual one.
Software	Windows 11 / 10 / 8
Benchmark Scores	My PC can run Crysis. Do I really need more than that?