• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce RTX 5080 to Stand Out with 30 Gbps GDDR7 Memory, Other SKUs Remain on 28 Gbps

Joined
Dec 25, 2020
Messages
7,627 (5.02/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
That 28 to 30 Gbps increase is the GDDR7 memory between the RTX 5080 (and lower SKUs) compared to the RTX 5090.

The RTX 4090 has 21 Gbps GDDR6X memory and the 7900 XTX has 20 Gbps GDDR6. That bandwidth increase is pretty substantial aside from the bus increase. I think the RTX 4080 SUPER had 23 Gbps GDDR6X chips.

24 Gbps on both 4080 and 4080 Super. 21 on 3090 Ti and other G6X equipped 40 series (4070, 70 Ti, 70 Ti Super, 4090). 1st generation 21 Gbps chip with half density in 3090, 3080 and 3080 Ti use a 19 Gbps IC.

Will there be a Ti without any competition?

Super versions might come to appease shareholders in 2026.
 
Joined
Sep 15, 2011
Messages
6,886 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
In a normal world, where user would be much more considerate with their hard earn money, this card should be DOA, unless is priced as the current 4070, or less. By the specs, this is a 5070 on all, except name.
It is extremely obvious thst in the near future a better variant with 20 or 24GB of VRAM, and probably way wider bus, is going to be released. So this card will be easily forgotten.
 

AcE

Joined
Dec 3, 2024
Messages
366 (4.52/day)
GDDR7 uses PAM3 signalling vs GDDR6X's PAM4. GDDR7 has lower voltages as well.
Yes, GDDR7 is to G6X what GDDR6 was to G5X - a more refined version, or "final" version and AMD will use it as well, long time. GDDR6 was announced by Samsung for up to 24 Gbps speed's btw, not making it really inferior to G6X, but I never saw those versions on any graphics cards.
 
Joined
Feb 3, 2017
Messages
3,918 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos

AcE

Joined
Dec 3, 2024
Messages
366 (4.52/day)
Did these actually ever get those? Specs were for 22 and 23 IIRC.

24 Gbps memory (according to review), which is downclocked to 22.4. Same with 4080 S, but downclocked to 23Gbps, guess it simply doesn't need more. The L2 "infinity cache" does outstanding work.
 
Joined
Feb 3, 2017
Messages
3,918 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
24 Gbps memory (according to review), which is downclocked to 22.4. Same with 4080 S, but downclocked to 23Gbps, guess it simply doesn't need more. The L2 "infinity cache" does outstanding work.
I very very much doubt the part I bolded :p
Interesting choice though to actually have faster rated memory but run at lower speed... I do wonder what that is about.
 

AcE

Joined
Dec 3, 2024
Messages
366 (4.52/day)
I very very much doubt the part I bolded :p
Then post data to support your claim. :) Very much doubt you have a point here. Get a 4080 and OC its vram, get about +0% performance. ^^ It has more than enough bandwidth.
 
Last edited:
Joined
Dec 25, 2020
Messages
7,627 (5.02/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
Did these actually ever get those? Specs were for 22 and 23 IIRC.

Yes, both of them have the exact same 24 Gbps IC, Micron MT61K512M32KPA-24 (D8BZF). It comes clocked in at 38 MHz faster in the 4080 Super out of the box, but the limits on both are relatively identical, as is the IC. The "normal" 4080 just has some extra headroom relative to its slightly slower stock speed as a result.

Then post data to support your claim. :) Very much doubt you have a point here. Get a 4080 and OC it's vram, get about +0% performance. ^^ It has more than enough bandwidth.

It actually does not have enough bandwidth, IMO. The 256-bit bus hampers the card really bad at high resolutions. The 4080 loses a ton of gas at 4K in games that are bandwidth heavy. Performance falls off much faster than it does with say, the 7900 XTX with its ~960 GB/s bandwidth.

It's a balanced configuration, however. I'll give you that much. All hardware resources generally end up fully utilized.
 
Joined
Feb 3, 2017
Messages
3,918 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Then post data to support your claim. :) Very much doubt you have a point here. Get a 4080 and OC its vram, get about +0% performance. ^^ It has more than enough bandwidth.
Are you actually trying to argue that faster VRAM speed in not beneficial to a GPU?
If you send me a 4080, I will gladly run some tests :D

Edit:
As Dr. Dro pointed out - gaming tests show that 4080 loses performance at higher resolutions faster than otherwise comparable or comparable-ish GPUs that have wider memory bus and more memory bandwidth. At smaller resolutions the bigger cache on Ada can mask the relatively lower bandwidth but it is simply not sufficient any more at higher resolutions.
 
Last edited:

AcE

Joined
Dec 3, 2024
Messages
366 (4.52/day)
Are you actually trying to argue that faster VRAM speed in not beneficial to a GPU?
If you send me a 4080, I will gladly run some tests :D
My argument is this:
- card has G6X 24 Gbps. Because the card already has more than enough *effective bandwidth* with the big L2 cache, they downclocked it to 22.4 / 23 (4080 S) to save energy. Go and check performance reviews, it doesn't lose "extra" in 4K because the bandwidth is sufficient. Go check 7900 XT, it loses too much in 4K. -> the infinity cache of Nvidia works better.

Sorry can't do, i only have the bigger one
 
Joined
Jun 10, 2014
Messages
3,042 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
It's funny. Nvidia users ask for more VRAM and Nvidia is giving them faster VRAM.
It's like a joke with deaf people.
And most people don't understand how VRAM even works. And while more VRAM doesn't increase performance directly, faster VRAM usually does.

It's a new architecture, and we know nothing about its performance characteristics. But judging from previous launches, it's reasonable to assume both memory compression and management are improved. And as usual, a lot of you will probably end up buying one in the end despite joining the "politically correct" whining about VRAM size, bits of memory bus or core count, when the only thing that really matters is how it performs in the real world.

Let's instead look forward to a brand new generation full of goodies. :)
 
Joined
Apr 13, 2017
Messages
179 (0.06/day)
System Name AMD System
Processor Ryzen 7900 at 180Watts 5650 MHz, vdroop from 1.37V to 1.24V
Motherboard MSI MAG x670 Tomahawk Wifi
Cooling AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory 32GB dual channel Gskill DDR6000CL30 tuned for CL28, at 1.42Volts
Video Card(s) Msi Ventus 2x Rtx 4070 and Gigabyte Gaming Oc Rtx 4060 ti
Storage Samsung Evo 970
Display(s) Old 1080p 60FPS Samsung
Case Normal atx
Audio Device(s) Dunno
Power Supply 1200Watts
Mouse wireless & quiet
Keyboard wireless & quiet
VR HMD No
Software Windows 11
Benchmark Scores 1750 points in cinebench 2024 42k 43k gpu cpu points in timespy 50+ teraflops total compute power.
Maybe they upscale the textures with AI so the 8GB VRAM is enough.

Also PCIE V5.0 will reduce stutters by 50%.

Let's assume card's memory is 8GB and dataset is 9GB. So 1GB constantly moved through PCIE V4 or V5.

PCIE V4: 1GB goes 30 milliseconds. 33 FPS

PCIE V5: 1GB goes 15 milliseconds. 67 FPS.

Let's assume card's memory is 8GB and dataset is 10GB: 2GB is moved.

PCIE V4: 2GB goes 60 milliseconds. 16 FPS

PCIE V5: 2GB goes 30 milliseconds. 33 FPS.

33 FPS is manageable imho. (I played CS at 10 FPS for years...)
 
Last edited:
Joined
Jun 10, 2014
Messages
3,042 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Also PCIE V5.0 will reduce stutters by 50%.

Let's assume card's memory is 8GB and dataset is 9GB. So 1GB constantly moved through PCIE V4 or V5.
If by dataset out mean textures and meshes only, then there would be a lot of large buffers additionally, most of which are heavily compressed though, so are some textures too. (Total allocated VRAM would probably be 12+ GB then.)

But let's assume that you have 9GB of assets to utilize during a single frame with only 8GB of VRAM, that will be a sad story, no question about that. But that's not how rendering engines work though;

Firstly, textures are stored as multiple mip levels and uses anisotropic filtering, so within the allocated memory there are dozens of versions of the "same" texture, and the GPU isn't going to be using all of that data during a single frame. So even if your dataset is 9 GB (even with dynamic loading of assets), only about ~1-1.5 GB of that will be used during any given frame. (Also the highest mip levels will only be used by objects close to the camera, and only so many object can ever be close at once, so for most of the scene very low mip levels are usually used.)

Secondly, the memory bandwidth will be a bottleneck long before you run into heavy swapping anyways. Take for instance RTX 4060 with its 272 GB/s, if you target 60 FPS then you could theoretically only ever access a total of 4.5 GB during 16.7 ms, half of that if you target 120 FPS. Additionally, accesses will not be evenly distributed throughout the frametime (i.e. vertex shaders and tessellation are much less memory intensive), so the real world peak would probably be less than half of that. As you see, by the time you even come close to the VRAM limit of a card, you are already approaching "slide show" territory at ~15 FPS. So any reasoning for having lots of VRAM for "future proofing" is a moot point, not to mention that future games will be even more computationally intensive.

If you do run into heavy swapping, you'll know, trust me, it would be quckly unplayable. But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.
 
Joined
Apr 13, 2017
Messages
179 (0.06/day)
System Name AMD System
Processor Ryzen 7900 at 180Watts 5650 MHz, vdroop from 1.37V to 1.24V
Motherboard MSI MAG x670 Tomahawk Wifi
Cooling AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory 32GB dual channel Gskill DDR6000CL30 tuned for CL28, at 1.42Volts
Video Card(s) Msi Ventus 2x Rtx 4070 and Gigabyte Gaming Oc Rtx 4060 ti
Storage Samsung Evo 970
Display(s) Old 1080p 60FPS Samsung
Case Normal atx
Audio Device(s) Dunno
Power Supply 1200Watts
Mouse wireless & quiet
Keyboard wireless & quiet
VR HMD No
Software Windows 11
Benchmark Scores 1750 points in cinebench 2024 42k 43k gpu cpu points in timespy 50+ teraflops total compute power.
If by dataset out mean textures and meshes only, then there would be a lot of large buffers additionally, most of which are heavily compressed though, so are some textures too. (Total allocated VRAM would probably be 12+ GB then.)

But let's assume that you have 9GB of assets to utilize during a single frame with only 8GB of VRAM, that will be a sad story, no question about that. But that's not how rendering engines work though;

Firstly, textures are stored as multiple mip levels and uses anisotropic filtering, so within the allocated memory there are dozens of versions of the "same" texture, and the GPU isn't going to be using all of that data during a single frame. So even if your dataset is 9 GB (even with dynamic loading of assets), only about ~1-1.5 GB of that will be used during any given frame. (Also the highest mip levels will only be used by objects close to the camera, and only so many object can ever be close at once, so for most of the scene very low mip levels are usually used.)

Secondly, the memory bandwidth will be a bottleneck long before you run into heavy swapping anyways. Take for instance RTX 4060 with its 272 GB/s, if you target 60 FPS then you could theoretically only ever access a total of 4.5 GB during 16.7 ms, half of that if you target 120 FPS. Additionally, accesses will not be evenly distributed throughout the frametime (i.e. vertex shaders and tessellation are much less memory intensive), so the real world peak would probably be less than half of that. As you see, by the time you even come close to the VRAM limit of a card, you are already approaching "slide show" territory at ~15 FPS. So any reasoning for having lots of VRAM for "future proofing" is a moot point, not to mention that future games will be even more computationally intensive.

If you do run into heavy swapping, you'll know, trust me, it would be quckly unplayable. But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.
By dataset size I meant assets that are always accesed for each frame. For example it can be a physx simulation with 1024x1024x1024 cells of space for fluid with 8 bytes per cell making 8GB always accessed each frame, plus some textures and stuff to render the objects in there maybe 1 GB more on top of fully accessed 8GB. Then the 1 extra GB goes through pcie for RAM.

Gpu certainly has enough compute power to accelerate fluid simulation to 60 FPS (especially with compressed L2 cache) but hindered by pcie bandwidth. For example, RTX4070 memory is 500GB/s, L2 is about 1.5TB/s but pcie only 20-30 GB/s because its version 4, not 5.
 
Joined
Jun 10, 2014
Messages
3,042 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
By dataset size I meant assets that are always accesed for each frame. For example it can be a physx simulation with 1024x1024x1024 cells of space for fluid with 8 bytes per cell making 8GB always accessed each frame, plus some textures and stuff to render the objects in there maybe 1 GB more on top of fully accessed 8GB. Then the 1 extra GB goes through pcie for RAM.
No game that I know of would reserve that much for a fluid simulation vs. other assets, not to mention a game wouldn't simulate a single large volume like that, but rather have detailed simulation around objects, and for most games there are only a very basic simulation of the surface needed. A large volume like that will also be highly compressible.

Gpu certainly has enough compute power to accelerate fluid simulation to 60 FPS (especially with compressed L2 cache) but hindered by pcie bandwidth. For example, RTX4070 memory is 500GB/s, L2 is about 1.5TB/s but pcie only 20-30 GB/s because its version 4, not 5.
You misunderstand the role of caches; it's actually a streaming buffer. Over time, the average throughput will converge towards the bandwidth of the memory, no matter how fast the cache may be. One of the crucial jobs of the L2 cache in GPUs is to compensate for uneven load across memory controllers, yes, there are usually more than one. Each 64-bit of memory bandwidth is connected to a separate controller, and whenever the data requested "overloads" one or more controllers, prefetching to a larger cache will help offset this and prevent unnecessary stalls of the GPU. (Back when AMD first added large L2 caches to their GPUs they saw large gains in some cases, as AMD's GPUs have struggled with poor resource management for many generations.) L2-Caches (both in GPUs and CPUs) have usually very large bandwidth, not because you will ever come close to that on average, but because you want the peak speed for each cache bank when you actually need it, to get the maximum savings from having the cache.
 
Joined
Apr 13, 2017
Messages
179 (0.06/day)
System Name AMD System
Processor Ryzen 7900 at 180Watts 5650 MHz, vdroop from 1.37V to 1.24V
Motherboard MSI MAG x670 Tomahawk Wifi
Cooling AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory 32GB dual channel Gskill DDR6000CL30 tuned for CL28, at 1.42Volts
Video Card(s) Msi Ventus 2x Rtx 4070 and Gigabyte Gaming Oc Rtx 4060 ti
Storage Samsung Evo 970
Display(s) Old 1080p 60FPS Samsung
Case Normal atx
Audio Device(s) Dunno
Power Supply 1200Watts
Mouse wireless & quiet
Keyboard wireless & quiet
VR HMD No
Software Windows 11
Benchmark Scores 1750 points in cinebench 2024 42k 43k gpu cpu points in timespy 50+ teraflops total compute power.
No game that I know of would reserve that much for a fluid simulation vs. other assets, not to mention a game wouldn't simulate a single large volume like that, but rather have detailed simulation around objects, and for most games there are only a very basic simulation of the surface needed. A large volume like that will also be highly compressible.


You misunderstand the role of caches; it's actually a streaming buffer. Over time, the average throughput will converge towards the bandwidth of the memory, no matter how fast the cache may be. One of the crucial jobs of the L2 cache in GPUs is to compensate for uneven load across memory controllers, yes, there are usually more than one. Each 64-bit of memory bandwidth is connected to a separate controller, and whenever the data requested "overloads" one or more controllers, prefetching to a larger cache will help offset this and prevent unnecessary stalls of the GPU. (Back when AMD first added large L2 caches to their GPUs they saw large gains in some cases, as AMD's GPUs have struggled with poor resource management for many generations.) L2-Caches (both in GPUs and CPUs) have usually very large bandwidth, not because you will ever come close to that on average, but because you want the peak speed for each cache bank when you actually need it, to get the maximum savings from having the cache.
Fluid simulation and generally particle simulations access neighbor particles for each particle. This is a redundancy and good for L2. Also compressible memory feature of cuda makes L2 compressed.

For example, game of life accesses closest 8 cells. Fluid mechanics can reach maybe 30 or more cells per cell. Nbody algorithm accesses all particles per particle when in brute force. This is extremely cache sensitive. When code updates cell data, L2 stores it compressed. So 40MB cache acts like 160MB. But doesnt affect main memory.

I wish rtx5080 comes with 200MB cache and 2x more compression so that up to 1.6GB redundancy can stay in gpu chip without touching vram. This would be like 8GB + 1.6GB. Add pcie v5.0 bandwidth too. Perfect combo to counter some stutters.

In cuda, there is an option to keep a selected region of data in L2 between different function calls or kernels.
 
Last edited:
Joined
Feb 24, 2023
Messages
3,608 (4.95/day)
Location
Russian Wild West
System Name D.L.S.S. (Die Lekker Spoed Situasie)
Processor i5-12400F
Motherboard Gigabyte B760M DS3H
Cooling Laminar RM1
Memory 32 GB DDR4-3200
Video Card(s) RX 6700 XT (vandalised)
Storage Yes.
Display(s) MSi G2712
Case Matrexx 55 (slightly vandalised)
Audio Device(s) Yes.
Power Supply Thermaltake 1000 W
Mouse Don't disturb, cheese eating in progress...
Keyboard Makes some noise. Probably onto something.
VR HMD I live in real reality and don't need a virtual one.
Software Windows 11 / 10 / 8
Benchmark Scores My PC can run Crysis. Do I really need more than that?
But if you do so, then you're probably doing something funky, like running a browser reloading a lot of tabs in the background, or the driver is bad.
Or trying to play Cyberpunk-esque games on an ancient 2 GB GPU like R9 380 also having 8 GB system RAM. With textures at Medium, it stutters like crazy and crashes with them on High. //please don't ask me why I know it

Generally yes, I have seen VRAM overclocking (or any other bandwidth improvement) helping out a lot but additional VRAM only helps in outlier games with some mega shady rendering going on. Or when we're talking additional over seriously obsolete (which is ~5 GB in this day and age) amount thereof.

I don't think 30 GT/s is remotely enough to release the 5080's full potential though. More like 45 GT/s or so, unless that cache somehow someway works 2+ times better than in Ada.
 
Joined
Jun 10, 2014
Messages
3,042 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Fluid simulation and generally particle simulations access neighbor particles for each particle. This is a redundancy and good for L2. Also compressible memory feature of cuda makes L2 compressed.
Pretty much anything the GPU is good at, is done on many instances in parallel, which is why memory accesses on GPU are in larger blocks.

For example, game of life accesses closest 8 cells. Fluid mechanics can reach maybe 30 or more cells per cell. Nbody algorithm accesses all particles per particle when in brute force. This is extremely cache sensitive.
The amount of simulation that a rendering engine can do within a strict deadline is vastly different from what you can do with a simulation without such constraints (even on the same class of hardware). So whether it's for academic purposes or in a professional setting creating a non-realtime simulation is very different from a game.

A game engine is designed more like a real time system, where algorithms aren't chosen primarily based on O-notation (like in academia), but for achieving consistent performance by awareness of access patterns, cache optimization and avoiding very costly system calls etc. in critical paths. This holds true for both rendering and for game loops; terrible worst-case would result in stutter while rendering, but it's even worse in a game loop, such problems are the cause of many bugs and glitches in modern games. All performant code must be developed with some kind of assessment of the underlying hardware characteristics, and cache optimization is one of the most important. I could go much deeper, but hopefully you get the point. ;)

When code updates cell data, L2 stores it compressed. So 40MB cache acts like 160MB. But doesnt affect main memory.
I wish rtx5080 comes with 200MB cache and 2x more compression so that up to 1.6GB redundancy can stay in gpu chip without touching vram.
Compression rates varies based on data density; sparse data can be very heavily compressed, while more random data is not. If your dataset is fairly reliable, then you can probably make an assessment of effective compression rate for your use case, but it wouldn't be applicable to anything else.
I'm not aware of the details of the upcoming generation, but generally speaking it's usually a hint better than the previous.

Or trying to play Cyberpunk-esque games on an ancient 2 GB GPU like R9 380 also having 8 GB system RAM. With textures at Medium, it stutters like crazy and crashes with them on High. //please don't ask me why I know it
Since when did we expect recent games to run well on 9 year old hardware that isn't even supported any more? Not that more VRAM would have saved it, at best you'd have a pretty slide show, as that GPU wouldn't be powerful enough or have enough bandwidth to pump out 60 FPS at good settings.
 
Joined
Apr 13, 2017
Messages
179 (0.06/day)
System Name AMD System
Processor Ryzen 7900 at 180Watts 5650 MHz, vdroop from 1.37V to 1.24V
Motherboard MSI MAG x670 Tomahawk Wifi
Cooling AIO240 for CPU, Wraith Prism's Fan for RAM but suspended above it without touching anything in case.
Memory 32GB dual channel Gskill DDR6000CL30 tuned for CL28, at 1.42Volts
Video Card(s) Msi Ventus 2x Rtx 4070 and Gigabyte Gaming Oc Rtx 4060 ti
Storage Samsung Evo 970
Display(s) Old 1080p 60FPS Samsung
Case Normal atx
Audio Device(s) Dunno
Power Supply 1200Watts
Mouse wireless & quiet
Keyboard wireless & quiet
VR HMD No
Software Windows 11
Benchmark Scores 1750 points in cinebench 2024 42k 43k gpu cpu points in timespy 50+ teraflops total compute power.
Even when a kernel function does exact same amount of computation for each run, the time taken can still vary between 1x - 3x due to driver or OS. Cuda needs TCC driver mode to reduce the fluctuation in windows. Sadly, rtx/gtx cant do tcc mode. But even the low low low end k420 can.

Wddm does its own grouping for multiple kernels so it sometimes cant even overlap data copy & compute. Ubuntu is better than windows 11 in this part.
 
Top