Monday, January 26th 2015
NVIDIA Responds to GTX 970 Memory Allocation 'Bug' Controversy
The GeForce GTX 970 memory allocation bug discovery, made towards last Friday, wrecked some NVIDIA engineers' weekends, who composed a response to what they tell is a non-issue. A bug was discovered in the way GeForce GTX 970 was allocating its 4 GB of video memory, giving some power-users the impression that the GPU isn't addressing the last 700-500 MB of its memory. NVIDIA, in its response, explained that the GPU is fully capable of addressing its 4 GB, but does so in an unusual way. Without further ado, the statement.
Source:
The TechReport
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.Continued
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.<div class="table-wrapper"><table class="tputbl hilight" cellspacing="0" cellpadding="3"><caption>GTX 970 vs. GTX 980 Memory-Intensive Performance Data </caption><tr><th scope="col"> </th><th scope="col">GeForce <br /> GTX 980</th><th scope="col">GeForce <br /> GTX 970</th></tr><tr><th scope="row">Shadow of Mordor</th><td align="right"></td><td align="right"></td></tr><tr class="alt"><th scope="row"><3.5GB setting = 2688x1512 Very High</th><td align="right">72 fps</td><td align="right">60 fps</td></tr><tr><th scope="row">>3.5GB setting = 3456x1944</th><td align="right">55fps (-24%)</td><td align="right">45fps (-25%)</td></tr><tr class="alt"><th scope="row">Battlefield 4</th><td align="right"></td><td align="right"></td></tr><tr><th scope="row"><3.5GB setting = 3840x2160 2xMSAA</th><td align="right">36 fps</td><td align="right">30 fps</td></tr><tr class="alt"><th scope="row">>3.5GB setting = 3840x2160 135% res</th><td align="right">19fps (-47%)</td><td align="right">15fps (-50%)</td></tr><tr><th scope="row">Call of Duty: Advanced Warfare</th><td align="right"></td><td align="right"></td></tr><tr class="alt"><th scope="row"><3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off</th><td align="right">82 fps</td><td align="right">71 fps</td></tr><tr class="alt"><th scope="row"><3.5GB setting = >3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on</th><td align="right">48fps (-41%)</td><td align="right">40fps (-44%)</td></tr></table></div>
Here's an example of some performance data:
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
92 Comments on NVIDIA Responds to GTX 970 Memory Allocation 'Bug' Controversy
Looks to be one of the original sources, and like many others have noted, when modding games such as skyrim you run out of memory long before GPU power. Wonder how well GTA5 will do with the lack of more vmem for Nvidia users again?
Also describes many of the users issues forums.evga.com/Games-stuttering-with-GTX-970-and-Vsync-m2222444.aspx with stuttering as the textures required are pulled out of the slower memory. forums.geforce.com/default/topic/777475/geforce-900-series/gtx-970-frame-hitching-/
Current user fix? Run at 30 FPS, just like a console!!!!
How exactly is the GTX 970 a failure in real world performance in games?
The wrong ROP and L2 amounts does suck, that is something nVidia should address.(Give me a free game and I'll be happy.:))
If you don't understand that it's OK, but there are some who mod games and buy hardware to support it.
Heh. I been meaning to replace a aging 480 with a 270/x then decided to wait until the 960 that disappointed. Now i'll wait to see what the 370 has to offer. Don't want to spend too much on it because I'm still up in the air about either keeping the PC for HTPC or giving it away.
And think about it, people playing these games that go over 3.5GB haven't been complaining about stuttering. The only reason we even noticed this problem was because some people noticed that some programs were saying they were only using 3.5GB and no more when they knew they should be using more. It wasn't because they were experiencing stuttering.
I think we need to know is that L2 "cut" as being something that is also done only because the 3 - SM cores being disabled? Or is it that more often one L2 came out bonkers/defective and they could only make the chip go as fast as it does by disabling it?
If we knew the truth either... One of the L2 is "unutilized" due to lower SM count, and they burn one off so that all 970's provide their level of performance then... fine and PR/marketing might have just goofed. Although, if the SM count has nothing to do with the L2 being defective, then I would think someone thought they could pull the wool over folks. Trying to not say they had to burn-off the L2, is messing with the published spec's.
NVIDIA GeForce GTX 970 3.5 GB memory issue
The GM204 diagram below was made by NVIDIA’s Jonah Alben (SVP of GPU engineering) specifically to explain the differences between the GTX 970 and GTX 980 GPU. What was not known till today, and it was falsely advertised by NVIDIA, is that GTX 970 only has 56 ROPs and smaller L2 cache than GTX 980. Updated specs clarify that 970 has one out of eight L2 modules disabled and as a result the total L2 cache is not 2048 KB, but 1792 KB. It wouldn’t probably change anything, however this particular L2 module is directly connected to 0.5 GB DRAM module.
To put this as simply as possible: GeForce GTX 970 has two memory pools: 3.5 GB running at full speed, and 0.5 GB only used when 3.5 GB pool is exhausted. However the second pool is running at 1/7th speed of the main pool.
So technically, till you deplete the memory available in the first pool, you will be using 3.5 GB buffer with 224-bit interface.
Ryan Shrout explains:
In a GTX 980, each block of L2 / ROPs directly communicate through a 32-bit portion of the GM204 memory interface and then to a 512MB section of on-board memory. When designing the GTX 970, NVIDIA used a new capability of Maxwell to implement the system in an improved fashion than would not have been possible with Kepler or previous architectures. Maxwell’s configurability allowed NVIDIA to disable a portion of the L2 cache and ROP units while using a “buddy interface” to continue to light up and use all of the memory controller segments. Now, the SMMs use a single L2 interface to communicate with both banks of DRAM (on the far right) which does create a new concern. (…)
And since the vast majority of gaming situations occur well under the 3.5GB memory size this determination makes perfect sense. It is those instances where memory above 3.5GB needs to be accessed where things get more interesting.
Let’s be blunt here: access to the 0.5GB of memory, on its own and in a vacuum, would occur at 1/7th of the speed of the 3.5GB pool of memory. If you look at the Nai benchmarks (EDIT: picture here) floating around, this is what you are seeing.
NVIDIA GeForce GTX 970 Corrected Specifications
GeForce GTX 970GeForce GTX 970 ‘Corrected’
Picture
GPU28nm GM204-20028nm GM204-200
CUDA Cores16641664
TMUs104104
ROPs6456
L2 Cache2048 KB1792 KB
Memory Bus256-bit256-bit
Memory Size4GB4GB (3.5GB + 0.5GB)
TDP145W145W
Check this video from PCPerspective:
Source: PCPerspective
-----------
NOT GOING TO BE FIXED =[ TO ALL 970 OWNERS UPDATE UR BOX WITH PEN OR SOMETHING....
It comes down to how individuals are effected by it. Like this current debate we have people saying no big deal and others saying its horrible. I rather have the information out there if it effects me or not. The more information we know the better informed decisions one can make.
vRAM capacity in the lower segments has always been more about marketing than real-world gain - you still need the GPU power to fully utilize the framebuffer, otherwise (technically) you could release a 256-bit card with 16GB of vRAM ( 16 chips @ 4Gbit with dual 16-bit I/O - the same reduced I/O that allows a FirePro W9100 to carry 16GB) - might be marketable, but it sure won't be a balanced design. Well, both vendors are constrained by the process node, transistor density, die size, and power budget. Any gains made on GPUs using the same 28nm process aren't going to significant compared to moving to a new process. Technically, both vendors could go for broke and churn out 650mm^2 GPUs, but the pricing to recoup costs, lower yields, and limited market would be a killer- and of course a quantum leap in single GPU performance basically starts killing the market for dual cards and multi-card SLI/CFX, unless the software evolves at a similar (or faster) rate. It also doesn't address far larger an more lucrative markets - the low power mobile sector, and shoehorning the latest "must have" features into the mainstream products.
Holy shit, cat's out of the bag, the hardware spec sheet given to review sites was wrong, the 970 does in fact feature less ROPs and cache than the 980, besides the divided VRAM partition mentioned before.
The card uses the first 3.5GBs of VRAM at full bandwidth for the memory crossbar accessing 7 memory modules, but the remaining 512MBs have to be accessed in tandem at a much lower bandwidth due to the single channel nature of the separate memory crossbar, faster than regular PCIe bandwidth but many times slower than the high performance 3.5GBs of VRAM in the first partition. So the card technically speaking has 4GBs of VRAM but the uppermost segment of it is almost an order of magnitude slower than the first chunk of memory.
The card still is a solid performer, and probably the best bang for your buck for gaming at 1440p and bellow, but Nvidia made a big no-no here, and they must be in full damage control mode :shadedshu:
If you really want a chuckle look at this page.
Seriously though, for most its still a good deal.
There might still be a few on the market, but I doubt it.