Thursday, May 18th 2023

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Press Release by

May 18th, 2023 10:08 Discuss (139 Comments)

NVIDIA receives a lot of questions about graphics memory, also known as the frame buffer, video memory, or "VRAM", and so with the unveiling of our new GeForce RTX 4060 Family of graphics cards we wanted to share some insights, so gamers can make the best buying decisions for their gaming needs. What Is VRAM? VRAM is high speed memory located on your graphics card.

It's one component of a larger memory subsystem that helps make sure your GPU has access to the data it needs to smoothly process and display images. In this article, we'll describe memory subsystem innovations in our latest generation Ada Lovelace GPU architecture, as well as how the speed and size of GPU cache and VRAM impacts performance and the gameplay experience.

GeForce RTX 40 Series Graphics Cards Memory Subsystem: Improving Performance & Efficiency
Modern games are graphical showcases, and their install sizes can now exceed 100 GB. Accessing this massive amount of data happens at different speeds, determined by the specifications of the GPU, and to some extent your system's other components. On GeForce RTX 40 Series graphics cards, new innovations accelerate the process for smooth gaming and faster frame rates, helping you avoid texture stream-in or other hiccups.

The Importance Of Cache
GPUs include high-speed memory caches that are close to the GPU's processing cores, which store data that is likely to be needed. If the GPU can recall the data from the caches, rather than requesting it from the VRAM (further away) or system RAM (even further away), the data will be accessed and processed faster, increasing performance and gameplay fluidity, and reducing power consumption.

GeForce GPUs feature a Level 1 (L1) cache (the closest and fastest cache) in each Streaming Multiprocessor (SM), up to twelve of which can be found in each GeForce RTX 40 Series Graphics Processing Cluster (GPC). This is followed by a fast, larger, shared Level 2 (L2) cache that can be accessed quickly with minimal latency.

Accessing each cache level incurs a latency hit, with the tradeoff being greater capacity. When designing our GeForce RTX 40 Series GPUs, we found a singular, large L2 cache to be faster and more efficient than other alternatives, such as those featuring a small L2 cache, and a large, slower to access L3 cache.

Prior generation GeForce GPUs had much smaller L2 Caches, resulting in lower performance and efficiency compared to today's GeForce RTX 40 Series GPUs.

During use, the GPU first searches for data in the L1 data cache within the SM, and if the data is found in L1 there's no need to access the L2 data cache. If data is not found in L1, it's called a "cache miss", and the search continues into the L2 cache. If data is found in L2, that's called an L2 "cache hit" (see the "H" indicators in the above diagram), and data is provided to the L1 and then to the processing cores.

If data is not found in the L2 cache, an L2 "cache miss", the GPU now tries to obtain the data from the VRAM. You can see a number of L2 cache misses in the above diagram that depicts our prior architecture memory subsystem, which causes a number of VRAM accesses.

If the data's missing from the VRAM, the GPU requests it from your system's memory. If the data is not in system memory, it can typically be loaded into system memory from a storage device like an SSD or hard drive. The data is then copied into VRAM, L2, L1, and ultimately fed to the processing cores. Note that different hardware -and software- based strategies exist to keep the most useful, and most reused data present in caches.

Each additional data read or write operation through the memory hierarchy slows performance and uses more power, so by increasing our cache hit rate we increase frame rates and efficiency.

Compared to prior generation GPUs with a 128-bit memory interface, the memory subsystem of the new NVIDIA Ada Lovelace architecture increases the size of the L2 cache by 16X, greatly increasing the cache hit rate. In the examples above, representing 128-bit GPUs from Ada and prior generation architectures, the hit rate is much higher with Ada. In addition, the L2 cache bandwidth in Ada GPUs has been significantly increased versus prior GPUs. This allows more data to be transferred between the cores and the L2 cache as quickly as possible.

Shown in the diagram below, NVIDIA engineers tested the RTX 4060 Ti with its 32 MB L2 cache against a special test version of RTX 4060 Ti using only a 2 MB L2, which represents the L2 cache size of previous generation 128-bit GPUs (where 512 KB of L2 cache was tied to each 32-bit memory controller).

In testing with a variety of games and synthetic benchmarks, the 32 MB L2 cache reduced memory bus traffic by just over 50% on average compared to the performance of a 2 MB L2 cache. See the reduced VRAM accesses in the Ada Memory Subsystem diagram above.

This 50% traffic reduction allows the GPU to use its memory bandwidth 2X more efficiently. As a result, in this scenario, isolating for memory performance, an Ada GPU with 288 GB/sec of peak memory bandwidth would perform similarly to an Ampere GPU with 554 GB/sec of peak memory bandwidth. Across an array of games and synthetic tests, the greatly increased hit rates improve frame rates by up to 34%.

Memory Bus Width Is One Aspect Of A Memory Subsystem
Historically, memory bus width has been used as an important metric for determining the speed and performance class of a new GPU. However, the bus width by itself is not a sufficient indicator of memory subsystem performance. Instead, it's helpful to understand the broader memory subsystem design and its overall impact on gaming performance.

Due to the advances in the Ada architecture, including new RT and Tensor Cores, higher clock speeds, the new OFA Engine, and Ada's DLSS 3 capabilities, the GeForce RTX 4060 Ti is faster than the previous-generations, 256-bit GeForce RTX 3060 Ti and RTX 2060 SUPER graphics cards, all while using less power.

Altogether, the tech specs deliver a great 60-class GPU with high performance for 1080p gamers, who account for the majority of Steam users.

The Amount of VRAM Is Dependent On GPU Architecture
Gamers often wonder why a graphics card has a certain amount of VRAM. Current-generation GDDR6X and GDDR6 memory is supplied in densities of 8 GB (1 GB of data) and 16Gb (2 GB of data) per chip. Each chip uses two separate 16-bit channels to connect to a single 32-bit Ada memory controller. So a 128-bit GPU can support 4 memory chips, and a 384-bit GPU can support 12 chips (calculated as bus width divided by 32). Higher capacity chips cost more to make, so a balance is required to optimize prices.

On our new 128-bit memory bus GeForce RTX 4060 Ti GPUs, the 8 GB model uses four 16Gb GDDR6 memory chips, and the 16 GB model uses eight 16Gb chips. Mixing densities isn't possible, preventing the creation of a 12 GB model, for example. That's also why the GeForce RTX 4060 Ti has an option with more memory (16 GB) than the GeForce RTX 4070 Ti and 4070, which have 192-bit memory interfaces and therefore 12 GB of VRAM.

Our 60-class GPUs have been carefully crafted to deliver the optimum combination of performance, price, and power efficiency, which is why we chose a 128-bit memory interface. In short, higher capacity GPUs of the same bus width always have double the memory.

Do On Screen Display (OSD) Tools Report VRAM Usage Accurately?
Gamers often cite the "VRAM usage" metric in On Screen Display performance measurement tools. But this number isn't entirely accurate, as all games and game engines work differently. In the majority of cases, a game will allocate VRAM for itself, saying to your system, 'I want it in case I need it'. But just because it's holding the VRAM, doesn't mean it actually needs all of it. In fact, games will often request more memory if it's available.

Due to the way memory works, it's impossible to know precisely what's being actively used unless you're the game's developer with access to development tools. Some games offer a guide in the options menu, but even that isn't always accurate. The amount of VRAM that is actually needed will vary in real time depending on the scene and what the player is seeing.

Furthermore, the behavior of games can vary when VRAM is genuinely used to its max. In some, memory is purged causing a noticeable performance hitch while the current scene is reloaded into memory. In others, only select data will be loaded and unloaded, with no visible impact. And in some cases, new assets may load in slower as they're now being brought in from system RAM.

For gamers, playing is the only way to truly ascertain a game's behavior. In addition, gamers can look at "1% low" framerate measurements, which can help analyze the actual gaming experience. The 1% Low metric - found in the performance overlay and logs of the free NVIDIA FrameView app, as well as other popular measurement tools - measures the average of the slowest 1% of frames over a certain time period.

Automate Setting Selection With GeForce Experience & Download The Latest Patches
Recently, some new games have released patches to better manage memory usage, without hampering the visual quality. Make sure to get the latest patches for new launches, as they commonly fix bugs and optimize performance shortly after launch.

Additionally, GeForce Experience supports most new games, offering optimized settings for each supported GeForce GPU and VRAM configuration, giving gamers the best possible experience by balancing performance and image quality. If you're unfamiliar with game option lingo and just want to enjoy your games from the second you load them, GeForce Experience can automatically tune game settings for a great experience each time.

NVIDIA Technologies Can Help Developers Reduce VRAM Usage
Games are richer and more detailed than ever before, necessitating those 100 GB+ installs. To help developers optimize memory usage, NVIDIA has several free developer tools and SDKs, including:

NVIDIA RTX Memory Utility (RTXMU): Ray tracing requires additional VRAM. RTXMU can reduce this usage by up to 50%
NVIDIA Micro-Mesh SDK: Reduces the memory usage of complex geometry while also increasing performance
NVIDIA Texture Tools Exporter: Creates highly compressed texture files to reduce memory usage and the file size of games

These are just a few of the tools and technologies that NVIDIA freely provides to help developers optimize their games for all GPUs, platforms, and memory configurations.

Some Applications Can Use More VRAM
Beyond gaming, GeForce RTX graphics cards are used around the world for 3D animation, video editing, motion graphics, photography, graphic design, architectural visualization, STEM, broadcasting, and AI. Some of the applications used in these industries may benefit from additional VRAM. For example, when editing 4K or 8K timelines in Premiere, or crafting a massive architectural scene in D5 Render.

On the gaming side, high resolutions also generally require an increase in VRAM. Occasionally, a game may launch with an optional extra large texture pack and allocate more VRAM. And there are a handful of games which run best at the "High" preset on the 4060 Ti (8 GB), and maxed-out "Ultra" settings on the 4060 Ti (16 GB). In most games, both versions of the GeForce RTX 4060 Ti (8 GB and 16 GB) can play at max settings and will deliver the same performance.

The benefit of the PC platform is its openness, configurability and upgradability, which is why we're offering the two memory configurations for the GeForce RTX 4060 Ti; if you want that extra VRAM, it will be available in July.

A GPU For Every Gamer
Following the launch of the GeForce RTX 4060 Family, there'll be optimized graphics cards for each of the three major game resolutions. However you play, all GeForce RTX 40 Series GPUs will deliver a best-in-class experience, with leading power efficiency, supported by a massive range of game-enhancing technologies, including NVIDIA DLSS 3, NVIDIA Reflex, NVIDIA G-SYNC, NVIDIA Broadcast, and RTX Remix.

For the latest news about all the new games and apps that leverage the full capabilities of GeForce RTX graphics cards, stay tuned to GeForce.com.

Source: NVIDIA Blog

Add your own comment

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

#101

kawice

The cache with good hit-rate might compensate for lower bus bandwidth, but it won't compensate for not enough of VRAM.
8 GB is barely enough for 1440p now, not to mention future or badly optimized games. And you don't buy new GPU to use if for year or two.
In the past Nvidia released cards with odd (non-mainstream) VRAM sizes like 6 GB, 11 GB or 10 GB. It's purely the design decision they chose clearly knowing what they're doing.
Is there any reason GPC cluster need to be multiply of two on any SKU? Can we have odd number of GPC cluster like 7 or 9 with 7 or 9 32-bit memory controllers?

That would allow below memory busses

160-bit bus (5 memory controllers) * 2GB VRAM = 10 GB VRAM

224-bit bus (7 memory controllers) * 2GB VRAM = 14 GB VRAM

288-bit bus (9 memory controllers) * 2GB VRAM = 18 GB VRAM

Don't tell me adding 1 GPC cluster with 1 additional memory controller and extra 2 GB of VRAM would make those cards much more expensive.
And seeing how 4080 is behind in performance to 4090 is clearly showing bad placing of all other cards. Not to mention 12 GB version of 4080 fiasco.
They should make a 40xx refresh with SUPER branding asap and fix this nonsense.

edit:
It looks like RTX 2080Ti had odd number of GPC clusters and that card was top model for 20xx series. It was basically cutdown version to RTX Titan back then.
2080Ti had 11 memory controllers with 1GB memory chips, while RTX Titan had 12 memory controllers with 2GB memory chips.

Nvidia no one will believe your lies anymore.

RTX-2080 Ti 4352 650 watt GDDR6 352 bit 616 GB/s 1350 MHz 1545 MHz Standard with 11 GB of Memory
Titan RTX 4608 650 watt GDDR6 384 bit 672 GB/s 1350 MHz 1770 MHz Standard with 24 GB of Memory

#102

Vayra86

remekraWell then I guess it's best to get a 7900XT/XTX since they have both large amount of VRAM and large amount of cache. Thanks Nvidia!
Props to PR team.

Duh :)

Chrispy_and CPU starts to matter as framerates increase. A Ryzen5 or i5 can handle 100fps without dying.

This. Anything over 100 FPS is in the realm of high end systems, not just GPU. But yeah that target is very sensible for an x60 now.

#103

chrcoluk

So Nvidia have noticed the negative talk and gone PR on it (regarding VRAM).

It doesnt change anything though, either release a tech that reduces VRAM usage significantly to stop your cards been a VRAM bottleneck, or add more VRAM to your cards.

#104

JustBenching

8GB are not enough. When developers are lazy :D :roll:

#105

chrcoluk

oxrufiioxoNvidia themselves has said the 4060ti 8GB is targeting 1080p gaming.

Perhaps they also need to add in their marketing it also targets only 1024x1024 texture resolution as well, as the two are independent of each other.

So e.g. 3080 1440p/4k combined with 1k texture resolution.
3090 4k combined with 4k texture resolution.
4070ti 1400p combined with 1k texture resolution.

This is been generous, ff7 remake, my 10 gig 3080 struggles with 512x512 textures.

Plus a disclaimer that although the cards have dedicated RT chips, the VRAM capacity may not allow you to enable RT in games. As well as textures going *poof*, blurry and so forth are a normal experience and not to be reported as a fault. :)

fevgatos8GB are not enough. When developers are lazy :D :roll:

Remember dev's rule the roost, they make the software we use.

The modern way of making software for multiple platforms isnt to redesign it for each platform, but to port it over, which is why we have consistent UI methodology across devices, and typically the biggest platform wins when it comes to optimisation for the platform.

In short if I am making hardware that maybe sells 1 million units a year, and someone else is making hardware that sells 10 million a year, I better make mine like their's as the devs arent going to write their code for my hardware that sells 10% of the other platform.

W1zzardI will test all cards at the same settings for the foreseeable future, which are maximum

Please test for dynamic quality drops, texture pop ins etc. as well, as VRAM issues wont necessarily slow down frame rates.

#106

Vayra86

chrcolukPerhaps they also need to add in their marketing it also targets only 1024x1024 texture resolution as well, as the two are independent of each other.

So e.g. 3080 1440p/4k combined with 1k texture resolution.
3090 4k combined with 4k texture resolution.
4070ti 1400p combined with 1k texture resolution.

This is been generous, ff7 remake, my 10 gig 3080 struggles with 512x512 textures.

Plus a disclaimer that although the cards have dedicated RT chips, the VRAM capacity may not allow you to enable RT in games. As well as textures going *poof*, blurry and so forth are a normal experience and not to be reported as a fault. :)

Remember dev's rule the roost, they make the software we use.

The modern way of making software for multiple platforms isnt to redesign it for each platform, but to port it over, which is why we have consistent UI methodology across devices, and typically the biggest platform wins when it comes to optimisation for the platform.

In short if I am making hardware that maybe sells 1 million units a year, and someone else is making hardware that sells 10 million a year, I better make mine like their's as the devs arent going to write their code for my hardware that sells 10% of the other platform.

Please test for dynamic quality drops, texture pop ins etc. as well, as VRAM issues wont necessarily slow down frame rates.

Can't wait to see all those Nvidia pilgrims continue their arduous journey in the future of high VRAM releases where it works exactly like you say. Already the excuses posited are getting pretty hilarious. Especially when a game post optimization still eats over 10GB. Its quite interesting if you run a higher VRAM GPU to see the real potential allocation. 11GB is commonplace, even in older titles.

I know, I'm a sadist like that, I know.

The rage vs devs is misplaced though. Rage on Nvidia, or yourself, for being an idiot buying yet another 8GB card and therefore preserving a status quo you dislike with a passion - I mean that is the actual rationale here if you get an 8GB x60(ti) in 2023. And... I know its a nasty truth, but it was also the rationale when Ampere launched. 8-10GB x70-x80 was similarly ridiculous bullshit - and it shows today, back then I was the bad guy for hammering on that fact :)

See the thing is, we've had >8GB for 6 years now, it should damn well be entry level at this point. Fun fact, it was Nvidia first releasing 8GB midrange in 2016, while AMD got stuck at 4! Just because Nvidia is stalling ever since, doesn't mean the world caters to that. Just like the world isn't catering to their ridiculous RT push, that they've now managed to damage by their own product stack. RT in the midrange just got pushed back for Nvidia by a full gen. Its like... help me understand the madness here!

#107

JustBenching

Vayra86Can't wait to see all those Nvidia pilgrims continue their arduous journey in the future of high VRAM releases where it works exactly like you say. Already the excuses posited are getting pretty hilarious. Especially when a game post optimization still eats over 10GB. Its quite interesting if you run a higher VRAM GPU to see the real potential allocation. 11GB is commonplace, even in older titles.

I know, I'm a sadist like that, I know.

The rage vs devs is misplaced though. Rage on Nvidia, or yourself, for being an idiot buying yet another 8GB card and therefore preserving a status quo you dislike with a passion - I mean that is the actual rationale here if you get an 8GB x60(ti) in 2023. And... I know its a nasty truth, but it was also the rationale when Ampere launched. 8-10GB x70-x80 was similarly ridiculous bullshit - and it shows today, back then I was the bad guy for hammering on that fact :)

See the thing is, we've had >8GB for 6 years now, it should damn well be entry level at this point. Fun fact, it was Nvidia first releasing 8GB midrange in 2016, while AMD got stuck at 4! Just because Nvidia is stalling ever since, doesn't mean the world caters to that. Just like the world isn't catering to their ridiculous RT push, that they've now managed to damage by their own product stack. RT in the midrange just got pushed back for Nvidia by a full gen. Its like... help me understand the madness here!

Ιf a patch is enough to drop vram usage by a pretty huge chunk while also making the textures look much - much - much better I feel like the vram of the card is kinda irrelevant. Current Medium textures on TLOU look like the old High while consuming less than half the amount of vram. Just stop for a second and think about it.. That's a COLOSSAL difference. Launch day it required twice the vram for the same quality!! That's insane. I hope you remember that back then I was saying Plague Tale looks much better than TLOU while using 1/3 of the vram. Well, here we are..

Im not suggestingg anyone to buy an 8gb card in 2023 for 4060 money, but damn even if the card had 16 gb it wouldn't matter seeing how devs just release games that actually hog up twice or three times the amount that is actually needed.

#108

Vayra86

fevgatosΙf a patch is enough to drop vram usage by a pretty huge chunk while also making the textures look much - much - much better I feel like the vram of the card is kinda irrelevant. Current Medium textures on TLOU look like the old High while consuming less than half the amount of vram. Just stop for a second and think about it.. That's a COLOSSAL difference. Launch day it required twice the vram for the same quality!! That's insane. I hope you remember that back then I was saying Plague Tale looks much better than TLOU while using 1/3 of the vram. Well, here we are..

Im not suggestingg anyone to buy an 8gb card in 2023 for 4060 money, but damn even if the card had 16 gb it wouldn't matter seeing how devs just release games that actually hog up twice or three times the amount that is actually needed.

Both things are true. Yes, optimization can do a lot. Yes, games will exceed 8GB required to get the intended experience. I mean let's consider that screenshot with Medium tex. The wall is still a 720p-upscaled-looking mess. There is a lot to be gained here from higher quality texturing - and that's just textures, which is far from the only thing residing in VRAM.

But they won't hog more than 16GB. The consoles carry 16. You downplay this way too much and your feeling of VRAM kinda irrelevant is nonsense.

#109

tfdsaf

$400 to be able to barely play games at medium settings at 1080p in 2023, what an amazing progress huh?

#110

chrcoluk

Vayra86Both things are true. Yes, optimization can do a lot. Yes, games will exceed 8GB required to get the intended experience. I mean let's consider that screenshot with Medium tex. The wall is still a 720p-upscaled-looking mess. There is a lot to be gained here from higher quality texturing - and that's just textures, which is far from the only thing residing in VRAM.

But they won't hog more than 16GB. The consoles carry 16. You downplay this way too much and your feeling of VRAM kinda irrelevant is nonsense.

It is interesting the same argument doesnt come out for games needing DLSS to be playable from people defending low VRAM, as the same applies there, devs if they wanted could make a game much less demanding, but they wont, what they can do isnt relevant, what they are doing is relevant. Been people plugging their ears for too long.

#111

Gica

playerlorenzoTL;DR :

NVIDIA tries to justify VRAM stagnation in their overpriced cards by using caching as a cop-out and trying to doubt the accuracy of OSD readings on VRAM usage.

Correct. Only AMD fans are allowed to challenge software measurements when they are unfavorable.
I have a 3070Ti with 8GB VRAM and I see that it performs excellently in new games as well. Far from the disaster offered by HU, as, 5 years ago, the AMD hordes were using the Assassin's series because in the other 999,999,999,999,999 games Intel was doing better.

#112

N3utro

fevgatosWell there is no other option, you either don't play the game or activate FG. I tried overclocking my 12900k to 5.6ghz all core at 1.64 volts, it was melting at 114c but hogwarts was not budging, certain areas dropped me below 60. That was on a fully tuned 12900k with manual ram. It's just one of those games...

Hogwarts runs on unreal engine 4, this engine is notorious for fps drops and stuttering. Look for a custom engine.ini file, it does wonders. Then you find yourself wondering why custom made config files created by strangers on the net end up beeing better than the ones created by the game designers :p

#113

gffermari

fevgatos8GB are not enough. When developers are lazy :D :roll:

That's a day and night difference in both how it looks and how much vram is needed now.

But still for a x60 class gpu, the segment that should play everything with tweaked settings, 8GB is not enough.
It may be for now but not for long when the consoles have way more VRAM to use.
They should go the odd way of 160bit 5x2GB GDDR6 and call it a day. No 2 versions of Tis etc. A normal 4060Ti 10GB and a cut down 4060 10GB and that's all.

The 16GB 4060Ti may be useful for some content creation software but with 128bit, I don't think it will be up to the task for gaming. Yes the capacity is there but the bandwidth is not.
(we'll see in the reviews though)

#114

yannus1

Dr. Dro...which, they do. NVIDIA offers wonderful, concise, well-supported features, and AMD often does not, or they are not good or popular enough to set the industry standard every time. There's no grand conspiracy here. In my opinion, Hardware Unboxed is a trustworthy source and they are generally unbiased, willing to point out strengths and weaknesses regardless of brand or product they are reviewing. Like they said on their 8 vs. 16 GB comparison video, AMD adding more VRAM to their cards isn't done out of kindness of their hearts, but because they had to pitch something to offer gamers.

It is true that their workstation segment is practically moribund (Radeon Pro is and has always been a bit of a mess, their support for most production applications is poor to non-existent especially if an app is designed with CUDA in mind - OpenCL sucks) and their high VRAM models offer 32 GB to those who need to work with extra large data sets, so giving an RX 6800/XT 16 GB isn't as big of deal to them as it is to Nvidia, who wants to ensure that their overpriced enterprise RTX A-series sell. This ensures that "hobbyist-level" creative professionals purchase at a minimum RTX 3090/3090 Ti or 4090 hardware, or supported professional models such as the RTX A4000 instead of a 3070/4070 and calling it a day.

If they were trustworthy, they wouldn't have started a censorship campaign on their forum like they just did, keeping about 20 % of the posts. Most people were complaining about the lack of VRAM and high prices and all their posts magically disappeared. Nvidia paid for RTX boxes on reviewers stages and continues paying.

#115

Chomiq

Vayra86Duh :)

This. Anything over 100 FPS is in the realm of high end systems, not just GPU. But yeah that target is very sensible for an x60 now.

Unless you're running Siege:

#116

Vayra86

ChomiqUnless you're running Siege:

Yeah... or Warframe, or Minesweeper, Unreal Tournament '99... etc etc etc etc etc :p

But you didn't buy your new GPU for that shit

#117

Dr. Dro

yannus1If they were trustworthy, they wouldn't have started a censorship campaign on their forum like they just did, keeping about 20 % of the posts. Most people were complaining about the lack of VRAM and high prices and all their posts magically disappeared. Nvidia paid for RTX boxes on reviewers stages and continues paying.

Ich liebe kapitalismus... What do you expect? It's business and the largest the share a business holds the louder greed talks. I recall reading that Nvidia's average margins actually surpass Apple's significantly.

Every company, when in a market leader position, will use underhanded marketing tactics. See AMD when they launched Zen 3; the fabrication and widespread lies about BIOS ROM sizes/refusal to update AGESA to upsell motherboards, etc.

#118

Nhonho

I wanted to know why only now they decided to release GPUs with larger amounts of cache.
Why didn't they do something so obvious sooner?
(I haven't read all the comments yet)

#119

Dr. Dro

NhonhoI wanted to know why only now decided to release GPUs with larger amounts of cache?
Why didn't they do something so obvious sooner?
(I haven't read all the comments yet)

Memory cache is extremely costly in regards of die area and amount of transistors required for it to function, which is why previously only data center grade processors had a large cache.

Approaching 5/4 nm class lithography, and developing advanced 3D stacking techniques, it is now feasible to spare this valuable die area without having the GPU be over 1000 mm² and costing thousands of dollars.

Expect future generations to further increase cache sizes as fabrication process node advances are achieved.

#120

londiste

Dr. DroMemory cache is extremely costly in regards of die area and amount of transistors required for it to function, which is why previously only data center grade processors had a large cache.
Approaching 5/4 nm class lithography, and developing advanced 3D stacking techniques, it is now feasible to spare this valuable die area without having the GPU be over 1000 mm² and costing thousands of dollars.
Expect future generations to further increase cache sizes as fabrication process node advances are achieved.

Adding to that - memory speed used to scale up very quickly. This is no longer the case and dedicating relatively large amounts of cache to augment bandwidth is the next best thing.
Of course, when looking for optimum points both bigger manufacturers ended up cutting down the memory bus widths quite heavily. For now.

Not sure about the further increases in cache size. At least when talking about helping with the memory/bandwidth issues. It is costly as you said and looking at cache size going down from RDNA3 to RDNA2 (and sizes on Ada GPUs) it looks like there is an optimum size both AMD and Nvidia have found (and it is similar to boot).

#121

Wirko

londisteAdding to that - memory speed used to scale up very quickly. This is no longer the case and dedicating relatively large amounts of cache to augment bandwidth is the next best thing.
Of course, when looking for optimum points both bigger manufacturers ended up cutting down the memory bus widths quite heavily. For now.

Not sure about the further increases in cache size. At least when talking about helping with the memory/bandwidth issues. It is costly as you said and looking at cache size going down from RDNA3 to RDNA2 (and sizes on Ada GPUs) it looks like there is an optimum size both AMD and Nvidia have found (and it is similar to boot).

Adding more to that - with each new manufacturing node, I mean a full node like 7nm-5nm-3nm, the logic density increases by a factor of ~1.7x, but static RAM density only increases by ~1.2x. I don't understand why, maybe someone can enlighten me, but that's the way it is. So cache is getting relatively more expensive over time compared to logic, although it's still denser.

#122

Nhonho

I'm not convinced yet.

I think AMD and Nvidia just recently discovered (the obvious) that more cache memory makes the GPU faster, due to the high latency of GDDR memory..

While an RTX 3090 Ti only has 6MB of L2, an RTX 4060 Ti will have 32MB!

I know that, in a new lithography, the cache memory area decreases very little. Sometimes, in a new lithograph, the cache area decrease doesn't even happen:

"TSMC's N3 features an SRAM bitcell size of 0.0199µm^², which is only ~5% smaller compared to N5's 0.021 µm^²SRAM bitcell. It gets worse with the revamped N3E as it comes with a 0.021 µm^² SRAM bitcell (which roughly translates to 31.8 Mib/mm^²), which means no scaling compared to N5 at all."
www.tomshardware.com/news/no-sram-scaling-implies-on-more-expensive-cpus-and-gpus

#123

Pooch

ixiNot beyond fast, but beyond overpriced.

We increased DLSS performance yeeeeey. Which we have already made few other advertisments while we hype the dlss 3
3.0 against 2.0 and we limit dlss 3.0 so that it can be ran on 4xxx. Hype the fake resolution baby.

Im gonna quote the first guy because really he was enough for this whole thread, and add that they did skip the hogwarts, they are scared , and they want to convince you that you should still buy the 8GB model just so you can find out what you thought in the first place; you need the 16 GB model. Thankyou. Less damage control and more last ditch attempt to milk the public for a product that is handicapped.

PoochIm gonna quote the first guy because really he was enough for this whole thread, and add that they did skip the hogwarts, they are scared , and they want to convince you that you should still buy the 8GB model just so you can find out what you thought in the first place; you need the 16 GB model. Thankyou. Less damage control and more last ditch attempt to milk the public for a product that is handicapped.

Yea that fake resolution crap is really just another crutch. So they don't have to give you the benchmarks you deserve, those screens were a joke, the first one literally says DLSS and the second not, i know it says both on the bottom, but clearly it was only on for one test so the results would be closer together to make the story look better. Like how they mentioned the performance for certain games would be almost identical, seems like the screens are a little too close nah mean?

#124

OneMoar

There is Always Moar

I am not defending nvidia
but game devlopers need todo there part as well
a game using 10+GB of vram due to shoddy programing is unacceptable

#125

TeamMe

Why don't they offer a 24GB RTX4070/TI for the home user, good for games and homebrew AI...

Add your own comment

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Related News

139 Comments on NVIDIA Explains GeForce RTX 40 Series VRAM Functionality

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts