Thursday, May 18th 2023

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Press Release by

May 18th, 2023 10:08 Discuss (139 Comments)

NVIDIA receives a lot of questions about graphics memory, also known as the frame buffer, video memory, or "VRAM", and so with the unveiling of our new GeForce RTX 4060 Family of graphics cards we wanted to share some insights, so gamers can make the best buying decisions for their gaming needs. What Is VRAM? VRAM is high speed memory located on your graphics card.

It's one component of a larger memory subsystem that helps make sure your GPU has access to the data it needs to smoothly process and display images. In this article, we'll describe memory subsystem innovations in our latest generation Ada Lovelace GPU architecture, as well as how the speed and size of GPU cache and VRAM impacts performance and the gameplay experience.

GeForce RTX 40 Series Graphics Cards Memory Subsystem: Improving Performance & Efficiency
Modern games are graphical showcases, and their install sizes can now exceed 100 GB. Accessing this massive amount of data happens at different speeds, determined by the specifications of the GPU, and to some extent your system's other components. On GeForce RTX 40 Series graphics cards, new innovations accelerate the process for smooth gaming and faster frame rates, helping you avoid texture stream-in or other hiccups.

The Importance Of Cache
GPUs include high-speed memory caches that are close to the GPU's processing cores, which store data that is likely to be needed. If the GPU can recall the data from the caches, rather than requesting it from the VRAM (further away) or system RAM (even further away), the data will be accessed and processed faster, increasing performance and gameplay fluidity, and reducing power consumption.

GeForce GPUs feature a Level 1 (L1) cache (the closest and fastest cache) in each Streaming Multiprocessor (SM), up to twelve of which can be found in each GeForce RTX 40 Series Graphics Processing Cluster (GPC). This is followed by a fast, larger, shared Level 2 (L2) cache that can be accessed quickly with minimal latency.

Accessing each cache level incurs a latency hit, with the tradeoff being greater capacity. When designing our GeForce RTX 40 Series GPUs, we found a singular, large L2 cache to be faster and more efficient than other alternatives, such as those featuring a small L2 cache, and a large, slower to access L3 cache.

Prior generation GeForce GPUs had much smaller L2 Caches, resulting in lower performance and efficiency compared to today's GeForce RTX 40 Series GPUs.

During use, the GPU first searches for data in the L1 data cache within the SM, and if the data is found in L1 there's no need to access the L2 data cache. If data is not found in L1, it's called a "cache miss", and the search continues into the L2 cache. If data is found in L2, that's called an L2 "cache hit" (see the "H" indicators in the above diagram), and data is provided to the L1 and then to the processing cores.

If data is not found in the L2 cache, an L2 "cache miss", the GPU now tries to obtain the data from the VRAM. You can see a number of L2 cache misses in the above diagram that depicts our prior architecture memory subsystem, which causes a number of VRAM accesses.

If the data's missing from the VRAM, the GPU requests it from your system's memory. If the data is not in system memory, it can typically be loaded into system memory from a storage device like an SSD or hard drive. The data is then copied into VRAM, L2, L1, and ultimately fed to the processing cores. Note that different hardware -and software- based strategies exist to keep the most useful, and most reused data present in caches.

Each additional data read or write operation through the memory hierarchy slows performance and uses more power, so by increasing our cache hit rate we increase frame rates and efficiency.

Compared to prior generation GPUs with a 128-bit memory interface, the memory subsystem of the new NVIDIA Ada Lovelace architecture increases the size of the L2 cache by 16X, greatly increasing the cache hit rate. In the examples above, representing 128-bit GPUs from Ada and prior generation architectures, the hit rate is much higher with Ada. In addition, the L2 cache bandwidth in Ada GPUs has been significantly increased versus prior GPUs. This allows more data to be transferred between the cores and the L2 cache as quickly as possible.

Shown in the diagram below, NVIDIA engineers tested the RTX 4060 Ti with its 32 MB L2 cache against a special test version of RTX 4060 Ti using only a 2 MB L2, which represents the L2 cache size of previous generation 128-bit GPUs (where 512 KB of L2 cache was tied to each 32-bit memory controller).

In testing with a variety of games and synthetic benchmarks, the 32 MB L2 cache reduced memory bus traffic by just over 50% on average compared to the performance of a 2 MB L2 cache. See the reduced VRAM accesses in the Ada Memory Subsystem diagram above.

This 50% traffic reduction allows the GPU to use its memory bandwidth 2X more efficiently. As a result, in this scenario, isolating for memory performance, an Ada GPU with 288 GB/sec of peak memory bandwidth would perform similarly to an Ampere GPU with 554 GB/sec of peak memory bandwidth. Across an array of games and synthetic tests, the greatly increased hit rates improve frame rates by up to 34%.

Memory Bus Width Is One Aspect Of A Memory Subsystem
Historically, memory bus width has been used as an important metric for determining the speed and performance class of a new GPU. However, the bus width by itself is not a sufficient indicator of memory subsystem performance. Instead, it's helpful to understand the broader memory subsystem design and its overall impact on gaming performance.

Due to the advances in the Ada architecture, including new RT and Tensor Cores, higher clock speeds, the new OFA Engine, and Ada's DLSS 3 capabilities, the GeForce RTX 4060 Ti is faster than the previous-generations, 256-bit GeForce RTX 3060 Ti and RTX 2060 SUPER graphics cards, all while using less power.

Altogether, the tech specs deliver a great 60-class GPU with high performance for 1080p gamers, who account for the majority of Steam users.

The Amount of VRAM Is Dependent On GPU Architecture
Gamers often wonder why a graphics card has a certain amount of VRAM. Current-generation GDDR6X and GDDR6 memory is supplied in densities of 8 GB (1 GB of data) and 16Gb (2 GB of data) per chip. Each chip uses two separate 16-bit channels to connect to a single 32-bit Ada memory controller. So a 128-bit GPU can support 4 memory chips, and a 384-bit GPU can support 12 chips (calculated as bus width divided by 32). Higher capacity chips cost more to make, so a balance is required to optimize prices.

On our new 128-bit memory bus GeForce RTX 4060 Ti GPUs, the 8 GB model uses four 16Gb GDDR6 memory chips, and the 16 GB model uses eight 16Gb chips. Mixing densities isn't possible, preventing the creation of a 12 GB model, for example. That's also why the GeForce RTX 4060 Ti has an option with more memory (16 GB) than the GeForce RTX 4070 Ti and 4070, which have 192-bit memory interfaces and therefore 12 GB of VRAM.

Our 60-class GPUs have been carefully crafted to deliver the optimum combination of performance, price, and power efficiency, which is why we chose a 128-bit memory interface. In short, higher capacity GPUs of the same bus width always have double the memory.

Do On Screen Display (OSD) Tools Report VRAM Usage Accurately?
Gamers often cite the "VRAM usage" metric in On Screen Display performance measurement tools. But this number isn't entirely accurate, as all games and game engines work differently. In the majority of cases, a game will allocate VRAM for itself, saying to your system, 'I want it in case I need it'. But just because it's holding the VRAM, doesn't mean it actually needs all of it. In fact, games will often request more memory if it's available.

Due to the way memory works, it's impossible to know precisely what's being actively used unless you're the game's developer with access to development tools. Some games offer a guide in the options menu, but even that isn't always accurate. The amount of VRAM that is actually needed will vary in real time depending on the scene and what the player is seeing.

Furthermore, the behavior of games can vary when VRAM is genuinely used to its max. In some, memory is purged causing a noticeable performance hitch while the current scene is reloaded into memory. In others, only select data will be loaded and unloaded, with no visible impact. And in some cases, new assets may load in slower as they're now being brought in from system RAM.

For gamers, playing is the only way to truly ascertain a game's behavior. In addition, gamers can look at "1% low" framerate measurements, which can help analyze the actual gaming experience. The 1% Low metric - found in the performance overlay and logs of the free NVIDIA FrameView app, as well as other popular measurement tools - measures the average of the slowest 1% of frames over a certain time period.

Automate Setting Selection With GeForce Experience & Download The Latest Patches
Recently, some new games have released patches to better manage memory usage, without hampering the visual quality. Make sure to get the latest patches for new launches, as they commonly fix bugs and optimize performance shortly after launch.

Additionally, GeForce Experience supports most new games, offering optimized settings for each supported GeForce GPU and VRAM configuration, giving gamers the best possible experience by balancing performance and image quality. If you're unfamiliar with game option lingo and just want to enjoy your games from the second you load them, GeForce Experience can automatically tune game settings for a great experience each time.

NVIDIA Technologies Can Help Developers Reduce VRAM Usage
Games are richer and more detailed than ever before, necessitating those 100 GB+ installs. To help developers optimize memory usage, NVIDIA has several free developer tools and SDKs, including:

NVIDIA RTX Memory Utility (RTXMU): Ray tracing requires additional VRAM. RTXMU can reduce this usage by up to 50%
NVIDIA Micro-Mesh SDK: Reduces the memory usage of complex geometry while also increasing performance
NVIDIA Texture Tools Exporter: Creates highly compressed texture files to reduce memory usage and the file size of games

These are just a few of the tools and technologies that NVIDIA freely provides to help developers optimize their games for all GPUs, platforms, and memory configurations.

Some Applications Can Use More VRAM
Beyond gaming, GeForce RTX graphics cards are used around the world for 3D animation, video editing, motion graphics, photography, graphic design, architectural visualization, STEM, broadcasting, and AI. Some of the applications used in these industries may benefit from additional VRAM. For example, when editing 4K or 8K timelines in Premiere, or crafting a massive architectural scene in D5 Render.

On the gaming side, high resolutions also generally require an increase in VRAM. Occasionally, a game may launch with an optional extra large texture pack and allocate more VRAM. And there are a handful of games which run best at the "High" preset on the 4060 Ti (8 GB), and maxed-out "Ultra" settings on the 4060 Ti (16 GB). In most games, both versions of the GeForce RTX 4060 Ti (8 GB and 16 GB) can play at max settings and will deliver the same performance.

The benefit of the PC platform is its openness, configurability and upgradability, which is why we're offering the two memory configurations for the GeForce RTX 4060 Ti; if you want that extra VRAM, it will be available in July.

A GPU For Every Gamer
Following the launch of the GeForce RTX 4060 Family, there'll be optimized graphics cards for each of the three major game resolutions. However you play, all GeForce RTX 40 Series GPUs will deliver a best-in-class experience, with leading power efficiency, supported by a massive range of game-enhancing technologies, including NVIDIA DLSS 3, NVIDIA Reflex, NVIDIA G-SYNC, NVIDIA Broadcast, and RTX Remix.

For the latest news about all the new games and apps that leverage the full capabilities of GeForce RTX graphics cards, stay tuned to GeForce.com.

Source: NVIDIA Blog

Add your own comment

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

#26

Bomby569

catulitechupAnd Resident Evil 4 and many others

:)

to be fair those are very shitty ports, unfinished games, that run like shit

#27

Sunlight91

Hogwarts Legacy is missing in their chart because they used up all their magic to put DLSS3 FPS in the same chart as real FPS. Cache can replace memory bandwidth as RDNA2 has proofed, but it can't replace capacity.

As far as I remember Pascal was the generation which enabled decent 4k gaming and now three generations later they want to sell 1080p again?

#28

thegnome

Heh, remember the 4090 got a 60%+ performance increase for 100$ more.

The 4060 Ti got a 15% increase for the same (or higher) price.

Them using larger caches to offset a smaller bus is cool and all, but what's the point if your GPU's are much more underpowered than could be possible at the same price point?

#29

5 o'clock Charlie

Space Lynxdon't forget his wall of leather jackets and marble countered kitchen.

Why not a closet full of leather jackets? I joked about it in another newspost that reminds me of Mr. Rogers and his collection of sweaters. The number of silicone spatulas in his kitchen I thought was rather amusing.

Space Lynxhe wants that yacht club membership next boys, so you need less vram for max profits! :roll:

Since he is worth billions, he can buy many yacht clubs! I dont want to know the cost of the membership fee for that. :laugh:

#30

forman313

pantspooperimagine being an nvidia fanboy and nvidia marketing telling you why you should be ok with less video memory lol

Pft... Cognitive dissonance has never been proven to have negative effects on the human mind of fan-boys and bigots. In fact, the ability to dismiss other beliefs, thoughts and even common sense is seen as a strength.

"You could use facts to prove anything that's even remotely true!”
Homer

#31

KellyNyanbinary

ChaitanyaHardware unboxed and others have been heavily pushing nGreedia hardware as it has fake frame generator.

I feel like they have been bashing NVIDIA recently with not one, not two, but at least three videos criticizing the 8 GB RTX 3070. I think they have been advocating for more VRAM instead of being satisfied with the status quo and frame smearing.

#32

Nostras

Bomby569to be fair those are very shitty ports, unfinished games, that run like shit

Nvidia wished they could say that their cards have enough VRAM and just use this sorry excuse whenever it's not.
Thank god the reviewers know better.

#33

Vayra86

Space LynxI agree with this.

My 5800x3d caps out around a 7900 XT... so if my 6800 XT will ever sell, my plan is to go route of 7900 XT. If 4070 ti had 16gb of vram I would have went with that instead. Vram is important these days, thats been demonstrated clearly by several people.

Can vouch for the 7900XT. Its been flawless.

NostrasNvidia wished they could say that their cards have enough VRAM and just use this sorry excuse whenever it's not.
Thank god the reviewers know better.

Well I have to give Nvidia some credit for their honesty. I mean this is like a coming out, even if they don't realize it, they are confirming more suspicion than they've removed. They do that especially in the lines where they say even a 4060ti will benefit from 16GB for higher IQ settings. They know they can't fight facts.

Imagine buying an 8GB card with all this information. You'd be pretty damn stupid. Especially if you know RT also demands added VRAM - the very feature Nvidia itself pushes.

#34

docnorth

oxrufiioxoThe fact the Nvidia thinks a 400 usd gpu should be used for 1080p med/high settings is pretty sad.

I'm waiting for @W1zzard to reveal the (sad?) truth.

#35

W1zzard

docnorthI'm waiting for @W1zzard to reveal the (sad?) truth.

I will test all cards at the same settings for the foreseeable future, which are maximum

#36

Vayra86

Bomby569at this point the vram discussion is more like a shouting match, lots of unreasonable claims, everyone has their own opinion, and me i'm still waiting for reasonable tests in reasonable scenarios. This should be a 1440p card to be use with medium settings, and that's what i want to see. Not 1080p (even if anyone can use it for that for sure), not ultra, not RT (no one is actually taking you seriously RT), not 4k, not tested with a 4090 pretending to be a 4060.

Check here from time to time, I think English gets you far enough, the text is irrelevant anyway.

tweakers.net/reviews/11022/5/nvidia-geforce-rtx-4070-lekker-zuinig-voor-een-moderne-videokaart-call-of-duty-modern-warfare-ii.html

Beyond that, medium settings are easy for any and most graphics cards in this segment. If you want to truly see the performance delta you need to challenge cards. You will see this if you click through that review up here. Its a wild orgy of CPU and pipeline / game engine bottlenecking, you're seeing the test bed, not the GPU.

#37

oxrufiioxo

W1zzardI will test all cards at the same settings for the foreseeable future, which are maximum

I'm hoping with the increased cache this card performs better than expected but I'm not holding my breath.

#38

Vayra86

W1zzardI will test all cards at the same settings for the foreseeable future, which are maximum

And rightly so. If people want the kiddie version they can go to YT to spell it out for them.

#39

DemonicRyzen666

That's definitely marketing speak say this whole thing skips the copy to main system ram that always happens if it had to all the way out to the HDD/SSD when it ever it happens.
This quote right here.

If the data's missing from the VRAM, the GPU requests it from your system's memory. If the data is not in system memory, it can typically be loaded into system memory from a storage device like an SSD or hard drive. The data is then copied into VRAM, L2, L1, and ultimately fed to the processing cores. Note that different hardware -and software- based strategies exist to keep the most useful, and most reused data present in caches.

It's basic misinformation.

oxrufiioxoThe fact the Nvidia thinks a 400 usd gpu should be used for 1080p med/high settings is pretty sad.

Sadder that people are still willing to pay for nvidia crap, just for so called features.
Nvidia is the "apple" of gpus.

Memory Bus Width Is One Aspect Of A Memory Subsystem
Historically, memory bus width has been used as an important metric for determining the speed and performance class of a new GPU. However, the bus width by itself is not a sufficient indicator of memory subsystem performance. Instead, it's helpful to understand the broader memory subsystem design and its overall impact on gaming performance.

marketing bs, high resolution game plays show's a different story.

Due to the advances in the Ada architecture, including new RT and Tensor Cores, higher clock speeds, the new OFA Engine, and Ada's DLSS 3 capabilities, the GeForce RTX 4060 Ti is faster than the previous-generations, 256-bit GeForce RTX 3060 Ti and RTX 2060 SUPER graphics cards, all while using less power.

Let's completely disregard the fact that each one for the new generation is on a smaller node then the previous generation were most of that energy efficiency comes from, almost 0 from architecture.
The only real so-called improvement that ADA has over the last two generation is 3% increase in efficiency of raytracing. A more accurate "optical flow accaletors" that were present even three generations back in turing. Which means it's just wasting silicon die space on Turing & ampere doing nothing.

#40

Marcus L

Bomby569to be fair those are very shitty ports, unfinished games, that run like shit

Nothing shitty about resident evil, it runs and looks great Hogwarts and the likes of tlou are a different kettle of fish however. I actually bought re4 RM at almost full price (slightly discounted as I used cdkeys) which I rarely ever do for a new game and it was money well spent.

#41

OneMoar

There is Always Moar

the other side of the vram 'issue' is lazy console devlopers
being that consoles are unfied memory the fast/cheap thing todo is just to cram all the assets into memory because its all one very fast pool of 16gb GDDR6 On a 256bit/320bit buss (no seperate 'ram' and 'vram' its all one segeragated pool

so come pc port time they don't bother to properly manage memory / i/o pressure and everything falls apart because they way they are handling assets is frankly inefficient

now nvidia knows this and they should have made the effort to ensure that 10GB was the minium

#42

Recus

AMD created Infinity Cache, it's game changer. Nvidia created this, it's gimmick they should add more memory chips. Tech illiterates these days. :shadedshu:

#43

3x0

RecusAMD created Infinity Cache, it's game changer.

I don't think anyone claimed that for AMD's L3 Infinity Cache.

#44

Nostras

RecusAMD created Infinity Cache, it's game changer. Nvidia created this, it's gimmick they should add more memory chips. Tech illiterates these days. :shadedshu:

You jest right?
There's a difference between adding extra cache in an attempt to give your cards an edge versus adding extra cache in an effort to save some money by skimping on the VRAM/bus width.
We've already seen that the 4070 still falls apart if VRAM fills up regardless of the extra cache vs last gen.

~~And even then, we've seen with AMD 6000 that the extra cache primarily improved performance at 1440p (a bit), it was never intended to compensate for VRAM.~~
I checked reviews against and honestly you can't tell if it's the cache or the architecture.

#45

Testsubject01

ChaitanyaHardware unboxed and others have been heavily pushing nGreedia hardware as it has fake frame generator.

Shills! All of them! :rolleyes:

#46

remekra

Well then I guess it's best to get a 7900XT/XTX since they have both large amount of VRAM and large amount of cache. Thanks Nvidia!
Props to PR team.

#47

Chrispy_

I mean this article could basically be paraphrased "Nvidia fails to divert attention away from inadequate VRAM by mansplaining how they copied AMD's InfinityCache"

Nobody here is falling for it; 8GB was $200 mainstream 7 years ago. We need at least 12GB at $399.

oxrufiioxoI'm hoping with the increased cache this card performs better than expected but I'm not holding my breath.

What's the point of better-than-expected performance if you're forced to run low settings due to VRAM limitations?

Nostras~~And even then, we've seen with AMD 6000 that the extra cache primarily improved performance at 1440p (a bit), it was never intended to compensate for VRAM.~~
I checked reviews against and honestly you can't tell if it's the cache or the architecture.

Cache helps offset less bandwidth, not less VRAM capacity. 2023's hot topic is games needing more capacity.

#48

catulitechup

Chrispy_I mean this article could basically be paraphrased "Nvidia fails to divert attention away from inadequate VRAM by mansplaining how they copied AMD's InfinityCache"

Nobody here is falling for it; 8GB was $200 mainstream 7 years ago. We need at least 12GB at $399.

for 16gb dont be pay more than 300us, and 10gb to 12gb no more than 250us personally

:)

#49

Chrispy_

catulitechupfor 16gb dont be pay more than 300us, and 10gb to 12gb no more than 250us personally

:)

A pipedream in the current market for new GPUs, that's barely achievable buying used, last-gen AMD cards on ebay.

That is, however, what I did. I bought a used 6800XT and 6700XT at the prices you suggested. The 3070 was replaced by a 6800XT, and whilst the 3060 at least had enough VRAM it was hot and slow (Thanks, Samsung 8nm!)

#50

catulitechup

Chrispy_A pipedream in the current market for new GPUs, that's barely achievable buying used, last-gen AMD cards on ebay.

That is, however, what I did. I bought a used 6800XT and 6700XT at the prices you suggested. The 3070 was replaced by a 6800XT, and whilst the 3060 at least had enough VRAM it was hot and slow (Thanks, Samsung 8nm!)

In this moment stay close specially with rx 6700 no xt 10gb around 270us

This greedy companies need low prices because stock will be huge

And on recesion most countries in world, buy a videocard is more a luxury than a something essential

Nvidia will be cut prices because shitty excuses dont give a chance for give more money to lack of features product like rtx 4060

:)

Add your own comment

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Related News

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts