Monday, January 26th 2015

NVIDIA Responds to GTX 970 Memory Allocation 'Bug' Controversy

The GeForce GTX 970 memory allocation bug discovery, made towards last Friday, wrecked some NVIDIA engineers' weekends, who composed a response to what they tell is a non-issue. A bug was discovered in the way GeForce GTX 970 was allocating its 4 GB of video memory, giving some power-users the impression that the GPU isn't addressing the last 700-500 MB of its memory. NVIDIA, in its response, explained that the GPU is fully capable of addressing its 4 GB, but does so in an unusual way. Without further ado, the statement.
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
Continued

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here's an example of some performance data:
<div class="table-wrapper"><table class="tputbl hilight" cellspacing="0" cellpadding="3"><caption>GTX 970 vs. GTX 980 Memory-Intensive Performance Data </caption><tr><th scope="col">&nbsp;</th><th scope="col">GeForce <br /> GTX 980</th><th scope="col">GeForce <br /> GTX 970</th></tr><tr><th scope="row">Shadow of Mordor</th><td align="right"></td><td align="right"></td></tr><tr class="alt"><th scope="row"><3.5GB setting = 2688x1512 Very High</th><td align="right">72 fps</td><td align="right">60 fps</td></tr><tr><th scope="row">>3.5GB setting = 3456x1944</th><td align="right">55fps (-24%)</td><td align="right">45fps (-25%)</td></tr><tr class="alt"><th scope="row">Battlefield 4</th><td align="right"></td><td align="right"></td></tr><tr><th scope="row"><3.5GB setting = 3840x2160 2xMSAA</th><td align="right">36 fps</td><td align="right">30 fps</td></tr><tr class="alt"><th scope="row">>3.5GB setting = 3840x2160 135% res</th><td align="right">19fps (-47%)</td><td align="right">15fps (-50%)</td></tr><tr><th scope="row">Call of Duty: Advanced Warfare</th><td align="right"></td><td align="right"></td></tr><tr class="alt"><th scope="row"><3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off</th><td align="right">82 fps</td><td align="right">71 fps</td></tr><tr class="alt"><th scope="row"><3.5GB setting = >3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on</th><td align="right">48fps (-41%)</td><td align="right">40fps (-44%)</td></tr></table></div>
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
Source: The TechReport
Add your own comment

92 Comments on NVIDIA Responds to GTX 970 Memory Allocation 'Bug' Controversy

#26
newtekie1
Semi-Retired Folder
IkarugaThey really should have gone the other way around with this whole story. The 970 is still a hell of a card. If they would have said that it only has 3.5GB, it would be all fine now.
They could have just „leaked” how to enable the last 0.5GB and enthusiast would all try it and argue and debate about it on every tech forums if it’s worth it or not because of the performance hit (which is about 2-3fps?)

There would be no drama now then.
But why say a card with 4GB, with access to all 4GB, is a 3.5GB card? I don't believe this is the first time we've seen something like this. All those asymmetrical cards out there, with memory amounts that don't fit the memory bus have the same problem once you cross a certain memory amount. The final bit of memory has a lot lower bandwidth. But the latency is still super low compared to offloading that data to the system RAM and reading it from there.
PashaFps is ok, u got sttuter when using all memory.
Stuttering when you hit the extra 500MB partition shouldn't be that bad, I'd venture to say it is probably going to be unnoticeable. That extra partition still has way better performance and latency than when the GPU has to start swapping things out to the system RAM. That is when the real stuttering starts to happen.
HTCIf this is indeed the result of this card's problem, then no amount of price cuts or rebates can make up for it and a recall SHOULD be made for cards showing this.
It isn't. The video is either fake or he has something else going on causing the issue. Just look at his memory usage in the second part. His system memory usage is under 6GB in the first half and over 13GB in the second...
Posted on Reply
#27
xorbe
I would rather have a straight-up 3.5GB card, than a 4GB card that gets wonky on the last 0.5GB
Posted on Reply
#28
buggalugs
xfiaok so they are saying the 980 doesnt have the partition right? and they both have the same performance loss in game testing.. hell if that is the case there is seriously nothing wrong.. I mean what happens to any other gpu if you throw it more than it can handle or cpu for that matter.
nvidia is probably laughing about it because it took them an hour to figure it out and forward a statement to tech sites that is so simple its hard for a 20 year professional to quickly figure out when they where expecting something much more complex.
It isn't just FPS though. The difference is the 980 still renders fluid gameplay above 3.5GB, the 970 turns into a stuttering mess above 3.5GB and some users get weird artifacts and flashes of purple and stuff across the screen.
Posted on Reply
#29
xfia
if you read between the lines they basically said look nubs there is nothing wrong with the memory and this is how you test it.

nvidia would probably say the 980 can deliver smoother game play because it is more powerful, harder to push its limit and as a result cost more.

I think people with really bad issues overclocked to high and messed them up.
calling out the experts at:lovetpu: to end the non since. lets see the test benches fire up!
Posted on Reply
#30
XL-R8R
buggalugsIt isn't just FPS though. The difference is the 980 still renders fluid gameplay above 3.5GB, the 970 turns into a stuttering mess above 3.5GB and some users get weird artifacts and flashes of purple and stuff across the screen.
Got any of your own evidence, or, like the majority of the people here, are just repeating FUD from around the interwebz? :fear::fear:


Having used my 970 extensively, and even breaking the mythical 3.5GB barrier on a few AAA titles, it doesn't bring performance down to the ground like the naysayers and trolls would LOVE people to believe... OR, I'm yet to see any discernible difference. :wtf:


I somehow think nVidia's recent graph's on performance characteristics are very close, if not exactly on par with, the real numbers after going over 3.5GB.

More testing is needing, this is obvious, but it isn't nearly as bad as it appears.... about the 'worst' thing is that (once again) a company played a sneaky PR move lol :nutkick::pimp:
Posted on Reply
#31
Sasqui
GhostRyderThe issue is only a big deal because it was hidden and can cause issues. The solution should have been just at the beginning to advertise the card as a 3.5gb card and the last .5gb that can cause the issues to be locked out or prioritized for basics that would have just been a nice bonus for people.
So, doing some math 3.5 vs. 4.0... they should quickly offer a 13% refund to folks who purchased the card prior to the news :p
Posted on Reply
#32
64K
SasquiSo, doing some math 3.5 vs. 4.0... they should quickly offer a 13% refund to folks who purchased the card prior to the news :p
I will take the 13% ($45.50) or Jen-Hsung Huang can come to my house and detail my Camry. :D
Posted on Reply
#33
the54thvoid
Super Intoxicated Moderator
From Hexus...

hexus.net/tech/news/graphics/79925-nvidia-explains-geforce-gtx-970s-memory-problems/


This Nvidia-provided slide gives brief insight into how the GTX 970 is constructed. The three disabled SMs are shown at the top and 256KB L2s and pairs of 32-bit memory controllers on the bottom. Notice the greyed-out right-hand L2 for this GPU? Tied into the ROPs as they are this is a direct consequence of reducing the overall ROP count. GTX 970 has 1,792KB of L2 cache, not 2,048KB, but, as Alben points out, still has a greater cache-to-SMM ratio than GTX 980.

Historically, including up to the Kepler generation, cutting off the L2/ROP portion would require the entire right-hand quad section to be deactivated too. Now, with Maxwell, Nvidia is able to use some smarts and still tap into the 64-bit memory controllers and associated DRAM even though the final L2 is missing/disabled. In other words, compared to previous generations, it can preserve more of the performance architecture even though a key part of a quad is purposely left out. This is good engineering.

But while it's still accurate to say the GeForce GTX 970 has a 256-bit bus through to a 4GB framebuffer - the memory controllers are all active, remember - cutting out some of the L2 but keeping all the MCs intact causes other problems; there is no usual eighth L2 to access, meaning that the seventh L2 will be hit twice. The way in which the L2 work makes this a very undesirable exercise, Alben explains, because this forces all other L2s to operate at half normal speed.

Smoke and mirrors
Finally coming back to point, Nvidia gets around this L2 problem by splitting the 4GB memory into a regular 3.5GB section, constituted by seven MCs and associated DRAM, and a 0.5GB section for the last memory controller. The company could have marketed the GeForce GTX 970 as a 3.5GB card, or even deactivated the entire right-hand quad and used a 192-bit memory interface allied to 3GB of memory but chose not to do so. How does this play out with the huge memory bandwidth drop-off in the Lazygamer Nia test versus Nvidia's statement that games barely suffer from this smart engineering? The Lazygamer test at the >3.5GB metric simply probes bandwidth on a single DRAM, which is admittedly low, or 1/8th of the total speed, while in-game code, according to Nvidia, doesn't pinpoint memory in this way. There's certainly a memory-bandwidth drop-off when the 0.5GB section is called into action, Alben states, but it's not anywhere near as severe if nonrecurring code is shunted into the last MC.

In a high-level nutshell Nvidia is using smart engineering to get the most out of the GTX 970's architecture. The lack of total ROPs is relatively unimportant because this GPU cannot make use of them - the 13 SM units, running at four pixels per clock (so 52 in total), are limiting the GPU more so than the 56 processed by the ROPs. The GeForce GTX 970's performance hasn't changed, obviously, but Nvidia wasn't clear on how the back-end works... and it has taken investigation by enthusiasts to uncover the real reason why this 256-bit architecture isn't as good as the GTX 980's
This seems pertinent:
The Lazygamer test at the >3.5GB metric simply probes bandwidth on a single DRAM, which is admittedly low, or 1/8th of the total speed, while in-game code, according to Nvidia, doesn't pinpoint memory in this way
Posted on Reply
#34
HumanSmoke
SasquiSo, doing some math 3.5 vs. 4.0... they should quickly offer a 13% refund to folks who purchased the card prior to the news :p
Like AMD did with Bulldozer when people found out that 4 modules isn't the necessarily the same as 8 cores ?:laugh:


EDIT: Anandtech's Ryan Smith has a decent write-up of the issuefor anyone interested - it certainly beats the flailing around in the dark that some people are indulging in.
Posted on Reply
#35
trenter
It's not a "bug", it's a total slap in the face of 970 owners. Nvidia just realized that this whole time they were selling 970's the specifications they were quoting were actually wrong. They admitted the 970 only really has 3.5gb of useable memory because the other .5gb is accessed at 28 gbps....WTF. Now let's hear the nvidia fangirls find a way to defend this joke for a company.....starting with W11zzardd.
Posted on Reply
#36
64K
trenterIt's not a "bug", it's a total slap in the face of 970 owners. Nvidia just realized that this whole time they were selling 970's the specifications they were quoting were actually wrong. They admitted the 970 only really has 3.5gb of useable memory because the other .5gb is accessed at 28 gbps....WTF. Now let's hear the nvidia fangirls find a way to defend this joke for a company.....starting with W11zzardd.
Flame much?
Posted on Reply
#37
HumanSmoke
trenterIt's not a "bug", it's a total slap in the face of 970 owners. Nvidia just realized that this whole time they were selling 970's the specifications they were quoting were actually wrong. They admitted the 970 only really has 3.5gb of useable memory because the other .5gb is accessed at 28 gbps....WTF. Now let's hear the nvidia fangirls find a way to defend this joke for a company.....starting with W11zzardd.
How auspicious. 2 posts in and you've already called out two staff members.
:shadedshu::shadedshu::shadedshu:
Posted on Reply
#38
Xzibit
64KI will take the 13% ($45.50) or Jen-Hsung Huang can come to my house and detail my Camry. :D
Don't forget the difference in ROP and L2 cache.

GTX 970 now has 52 ROPs instead of 64 and 1792KB of L2 Cache instead of 2048KB
NVIDIA’s Senior VP of GPU EngineeringTo those wondering how peak bandwidth would remain at 224 GB/s despite the division of memory controllers on the GTX 970, Alben stated that it can reach that speed only when memory is being accessed in both pools.
This is entertaining to see develop :toast:
Posted on Reply
#39
rtwjunkie
PC Gaming Enthusiast
HumanSmokeHow auspicious. 2 posts in and you've already called out two staff members.
:shadedshu::shadedshu::shadedshu:
Of course, his bravery is questionable, since neither post actually tagged them. In fact, it seems he deliberately mispelled our site owner's name. LOL!
Posted on Reply
#40
Rahmat Sofyan
For Fun, W1zz should change the Value and Conclusion for GTX 970 right now :D
Oh, and nVidia seems fucked :P
Posted on Reply
#41
HumanSmoke
XzibitGTX 970 now has 52 ROPs instead of 64 and 1792KB of L2 Cache instead of 2048KB
The revised total of ROPs is 56 not 52.
Rahmat SofyanFor Fun, W1zz should change the Value and Conclusion for GTX 970 right now :D
Oh, and nVidia seems fucked :p
Why? What changed? Is the performance of the card any less? Does it use more power than it did when it launched?
Posted on Reply
#42
Steevo
XzibitDon't forget the difference in ROP and L2 cache.

GTX 970 now has 52 ROPs instead of 64 and 1792KB of L2 Cache instead of 2048KB



This is entertaining to see develop :toast:
Considering it cannot physically access both pools as the cache crossbar is used to access the last .5 and it cannot be done while its busy reading its own memory in the current configuration.......

Looks like Nvidia made up specs and hoped no one would notice, and now that they have they are going to "correct" them to what they know them to be from the start.
Posted on Reply
#43
Sasqui
HumanSmokeLike AMD did with Bulldozer when people found out that 4 modules isn't the necessarily the same as 8 cores ?:laugh:
50% refund right there! ;)
Posted on Reply
#44
Xzibit
Here is PcPer take on it.

Posted on Reply
#45
the54thvoid
Super Intoxicated Moderator
XzibitHere is PcPer take on it.

Kinda what Anandtech are saying too.

Nvidia has egg on it's face for 'ahem' lying about it's card, no doubt but the performance of it isn't an issue. Each reviewer looking at it in turn, (PCper, Anand, Hexus) has the same conclusion which is threefold:
1) Nvidia have slipped up and undoubtedly their PR and engineering sections have 'misled' the public somewhat. (IMO, I don't believe it was innocent but hey)
2) The real performance impact isn't there. The card, according to all sites so far, is still great.
3) People are trying and failing so far to find a real world gaming example that kills the cards performance, outside of a load that would do that anyway based on it's SMM units etc.

FWIW, IMO, Nvidia knew fine well what they were releasing and probably expected no fall out from it, due to the fact it has no impact on real scenario's. But techy people like to dig and found an anomaly. Now NV have to explain it and it's hard to make this one sound like a genuine 'miss'. Even if it was a genuine lapse, it's very hard to sell to us, the public.

But hey, this ugly truth (bad move NV, but still a great card) won't stop people throwing those ignorance stones.
Posted on Reply
#47
Ja.KooLit
phew. its all over the net now this issue.
Posted on Reply
#48
Rahmat Sofyan
HumanSmokeThe revised total of ROPs is 56 not 52.

Why? What changed? Is the performance of the card any less? Does it use more power than it did when it launched?
Just for fun dude :)...
Posted on Reply
#49
GhostRyder
XzibitDon't forget the difference in ROP and L2 cache.

GTX 970 now has 52 ROPs instead of 64 and 1792KB of L2 Cache instead of 2048KB



This is entertaining to see develop :toast:
Yea, I wonder how many more changes to cards we will get down the line LOL. A little annoying to hide certain little details about cards and then expect to just sweep it under the rug. It still would have been easier to be much more up front about it and then the amount of people who bought the card changing their mind would probably be a small margin (Heck some might end up going to the GTX 980). It just makes you look bad not saying something and now doubt lingers on the mind no matter how you want to think about it.
XzibitHere is PcPer take on it.

Good vid.
Posted on Reply
#50
Xzibit
the54thvoidKinda what Anandtech are saying too.

Nvidia has egg on it's face for 'ahem' lying about it's card, no doubt but the performance of it isn't an issue. Each reviewer looking at it in turn, (PCper, Anand, Hexus) has the same conclusion which is threefold:
1) Nvidia have slipped up and undoubtedly their PR and engineering sections have 'misled' the public somewhat. (IMO, I don't believe it was innocent but hey)
2) The real performance impact isn't there. The card, according to all sites so far, is still great.
3) People are trying and failing so far to find a real world gaming example that kills the cards performance, outside of a load that would do that anyway based on it's SMM units etc.

FWIW, IMO, Nvidia knew fine well what they were releasing and probably expected no fall out from it, due to the fact it has no impact on real scenario's. But techy people like to dig and found an anomaly. Now NV have to explain it and it's hard to make this one sound like a genuine 'miss'. Even if it was a genuine lapse, it's very hard to sell to us, the public.

But hey, this ugly truth (bad move NV, but still a great card) won't stop people throwing those ignorance stones.
Right now the "miscommunication" is more interesting along with the debate.

Performance wise its not visually apparent with the majority of games but if the so called "Next Gen PS4-XB1 ports" games ever get here with "DX12" the issues will be more apparent to the majority. At least that how I see it. Most games are catching up to DX10+ and the new PS4-XB1 game are being ported with texture packs that are coming in at 3GB at VHQ @ 1080p. Who know by then Nvidia might also have a 1070 that doesn't have this issues.

I go back to my displeasure of both camps minimizing the offerings and the 970 looks like its was more of a just good enough to replace the 780s. 280->285, 760->960. As consumers we are going to keep getting screwed and it seems more and more of the majority are willing to spread cheeks and take it and brag about how a wonderful experience it was.

It will eventually play itself out or continue being a thorn.
Posted on Reply
Add your own comment
Dec 21st, 2024 20:57 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts