Tuesday, January 27th 2015

GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

It looks like NVIDIA's first response to the GeForce GTX 970 memory allocation controversy clearly came from engineers who were pulled out of their weekend plans, and hence was too ambiguously technical (even for us). It's only on Monday that NVIDIA PR swung into action, offering a more user-friendly explanation on what the GTX 970 issue is, and how exactly did they carve the GM204 up, when creating the card.

According to an Anandtech report, which cites that easy explanation from NVIDIA, the company was not truthful about specs of GTX 970, at launch. For example, the non-public document NVIDIA gave out to reviewers (which gives them detailed tech-specs), had clearly mentioned ROP count of the GTX 970 to be 64. Reviewers used that count in their reviews. TechPowerUp GPU-Z shows ROP count as reported by the driver, but it has no way of telling just how many of those "enabled" ROPs are "active." The media reviewing the card were hence led to believe that the GTX 970 was carved out by simply disabling three out of sixteen streaming multiprocessors (SMMs), the basic indivisible subunits of the GM204 chip, with no mention of other components like the ROP count, and L2 cache amount being changed from the GTX 980 (a full-fledged implementation of this silicon).
NVIDIA explained to Anandtech that there was a communication-gap between the engineers (the people who designed the GTX 970 ASIC), and the technical marketing team (the people who write the Reviewer's Guide document, and draw the block-diagram). This team was unaware that with "Maxwell," you could segment components previously thought indivisible, or that you could "partial disable" components.

It turns out that in addition to three SMX units being disabled (resulting in 1,664 CUDA cores), NVIDIA reduced the L2 cache (last-level cache) on this chip to 1.75 MB, down from 2 MB, and also disabled a few ROPs. The ROP count is effectively 56, and not 64. The last 8 ROPs aren't "disabled." They're active, but not used, because their connection to the crossbar is too slow (we'll get to that in a bit). The L2 cache is a key component of the "crossbar." Think of the crossbar as a town-square for the GPU, where the various components of the GPU talk to each other by leaving and picking-up data labeled with "from" and "to" addresses. The crossbar routes data between the four Graphics Processing Clusters (GPCs), and the eight memory controllers of 64-bit bus width each (which together make up its 256-bit wide memory interface), and is cushioned by the L2 cache.

The L2 cache itself is segmented, and isn't a monolithic slab of SRAM. Each of the eight memory controllers on the GM204 is ideally tied to its segment of the L2 cache. Also tied to these segments are segments of ROPs. With NVIDIA reducing the L2 cache amount by disabling one such segment. Its component memory controller is instead rerouted to the cache segment of a neighbouring memory controller. Access to the crossbar for that memory controller is hence slower. To make sure there are no issues caused to the interleaving of these memory controllers, adding up to the big memory amount figure that the driver can address, NVIDIA partitioned the 4 GB of memory to two segments. The first is 3.5 GB large, and is made up of memory controllers with access to their own segments of the L2; the second segment is 512 MB in size, and is tied to that memory controller which is rerouted.

The way this partitioning works, is that the 3.5 GB partition can't be read while the 512 MB one is being read. Only to an app that's actively using the entire 4 GB of memory, there will be a drop in performance, because the two segments aren't being read at the same time. The GPU is either addressing the 3.5 GB segment, or the 512 MB one. Hence, there's a drop in performance to be expected, again, for apps that use up the entire 4 GB of memory.

While it's technically correct that the GTX 970 has a 256-bit wide memory interface, and given its 7.00 GHz (GDDR5-effective) memory clock, that translates to 224 GB/s of bandwidth on paper, not all of that memory is uniformly fast. You have 3.5 GB of it having normal access to the crossbar (the town-square of the GPU), and 512 MB of it having slower access. Therefore, the 3.5 GB segment really just has 196 GB/s of memory bandwidth (7.00 GHz x 7 ways to reach the crossbar x 32-bit width per chip), which can be said with certainty. Nor can we say how this segment affects the performance of the memory controller whose crossbar port it's using, if the card is using its full 4 GB. We can't tell how fast the 512 MB second segment really is. But it's impossible for the second segment to make up 28 GB/s (of the 224 GB/s), since NVIDIA itself claims this segment is running slower. Therefore NVIDIA's claims of GTX 970 memory bandwidth being 224 GB/s at reference clocks is inaccurate.

Why NVIDIA chose to reduce cache size and ROP count will remain a mystery. We can't imagine that the people designing the chip will not have sufficiently communicated this to the driver and technical marketing teams. To claim that technical marketing didn't get this the first time around, seems like a hard-sell. We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA. An explanation about this hardware change should have taken up an entire page in the technical documents the first time around, and NVIDIA could have saved itself a lot of explanation, much of it through the press.
Source: Anandtech
Add your own comment

138 Comments on GTX 970 Memory Drama: Plot Thickens, NVIDIA has to Revise Specs

#26
Sasqui
v12dockClass action lawsuit?
No doubt there are a ton of lawyers working on this one, looking at the EULA to see if there's a clause that states you can't sue NV if you install the drivers (half joking).

Refund calcs (from previous thread)... 3.5 vs 4.0, I say a 13% refund, lol
Posted on Reply
#27
RejZoR
Ferrum MasterNada, too much latency for high FPS rate and frame time costs... we are arguing about stutter when same gpu accesses via crossbar the other memory partition of itself, it would be a mess if a second GPU wan't to access unified pool data via PCIE and back to second card?
That's why I said "in theory". But we'd have to replace sluggish PCIe with something like fiber optics or something to achieve that. And even then it's questionable.
Posted on Reply
#28
Ikaruga
RCoonBut NVidia has basically lied about hardware specifications. I don't believe for a second this was all one big mistake of somebody not saying to marketing that the card did not in fact have 64 ROPs and 224GB/s bandwidth.
I have to disagree. I can't imagine a single reason why would an engineer lie 64 ROPs when there is only 56. These monster companies always have dedicated teams for communications with the outside world (press, developers, retailers, etc). The only thing I can imagine is that somebody in that department failed big time, (regardless if it was deliberate or just a stupid a mistake from that person). I can't see why would the company management lie about the 970, if they would need to lie about something to get more sales, they would lie about the flagship product imo.
The engineers probably had a lot of 980s with bad yields at production and they just lasercut them to 970s. It has been a practice for many generations and they never lied about it before why would they start it now? Seriously why?
Posted on Reply
#29
btarunr
Editor & Senior Moderator
SasquiNo doubt there are a ton of lawyers working on this one, looking at the EULA to see if there's a clause that states you can't sue NV if you install the drivers (half joking).

Refund calcs (from previous thread)... 3.5 vs 4.0, I say a 13% refund, lol
Check out our Facebook page.
Posted on Reply
#30
Ferrum Master
RejZoRThat's why I said "in theory". But we'd have to replace sluggish PCIe with something like fiber optics or something to achieve that. And even then it's questionable.
Actually no... They just need a proper old school northbridge. It could be done on dual single PCB cards. They have a PLX chip now it just needs a memory controller, and the PLX is wired to the ram chips on the same board. Imho it even won't need a special driver and it the bus width is wide enough it could actually work without rewriting drivers.

On a classic motherboard... without some revolution ie proprietary connector to the motherboard nope...
Posted on Reply
#31
NightOfChrist
RCoonAll the benchmarks in all the reviews are still accurate of course, so everything about how it performs in games at various resolutions is still true.

But NVidia has basically lied about hardware specifications. I don't believe for a second this was all one big mistake of somebody not saying to marketing that the card did not in fact have 64 ROPs and 224GB/s bandwidth. By all accounts it's pretty crappy business practice, and they should be punished accordingly.

That being said. I still like my 3.5GB 970 for the price I got it at.
I agreed. It is still a great card. The segmented vRAM and how it performs surprised me but it does not change the fact it is a great card for ultra 1080p gaming.
Ferrum MasterThe problem is with reasonable buyers, who bought the card to be future proof, and thus taking the vram amount into reasoning. And Games tend to eat more VRAM lately... if you play old ones, except Skyrim, then it is OK, but those who bought 970 wont just play CS:GO. It would be shame if after 6months witcher3 and GTA5 will bring this card to their knees and a new card will be needed again... but hey... that was the plan :nutkick:
NVIDIA made a mistake with the design, intentionally or otherwise, so it is fair if people blame them for it. But many customers can be blamed too. From many western forums I read so far when they bought the card they expected a GTX 980 with a price of GTX 970, something I never believed to be existed in the first place. I always thought there's something more than just being cheaper and perform less faster but a lot of people believed they are 「future-proofing」 with this card. Even I was surprised when I read several owners claimed they bought the card ~ a single card ~ for a 4K monitor gaming setup.
Posted on Reply
#32
THE_EGG
It is a little grey but I guess this would fall under fraudulent misrepresentation which as far as I know is illegal in Australia (found in section 18 of the Australian Consumer Law). I'm sure it is also illegal in most - if not all - other countries too. Nvidia should take the appropriate actions (e.g. some kind of compensation) to resolve this issue fairly.
Posted on Reply
#33
FreedomEclipse
~Technological Technocrat~
RCoonJust to let you guys know, retailers and AIB partners (Gigabyte, Asus, MSI) are not accepting returns for this problem at this time. I presume they will be in avid communications with NVidia first before we get a response on where to go from here.
Pitty.... I was hoping to get a slight refund as i do have 2 970s....

Kinda stumped on what to do next tbh. Might try to return these 2 970s and pick up to 780Ti's off ebay or something on the cheap.
Posted on Reply
#34
Ferrum Master
THE_EGGI'm sure it is also illegal in most - if not all - other countries too
Even a cookie must contain description of each component used, who knows, maybe someone is quite allergic to cut down ROP count. :laugh:
Posted on Reply
#35
Sasqui
btarunrCheck out our Facebook page.
LOL, cute. It's all good man.

I saw this in downtown Providence and snapped a photo, he should start a video card ad campaign.

Posted on Reply
#36
Ferrum Master
NightOfChristBut many customers can be blamed too.
Nada, never blame the customers, they are fools maybe, but they didn't commit a crime in believing the spec sheet.

And 4K on 970? Where is the problem for older games? All UT3 games? Grid2? Civ? Source based games? And millions of users still for WoW?

Posted on Reply
#37
rtwjunkie
PC Gaming Enthusiast
looniam"Why NVIDIA chose to reduce cache size and ROP count will remain a mystery."

idk, it seemed the TR, PCper and esp. anand tech articles made it quite clear. though i do seem to have a talent at solving murder mysteries within the first chapter of the book.

" We can't imagine that the people designing the chip will not have sufficiently communicated this to the driver and technical marketing teams."

do you think they go out and have after work drinks? i'd be surprise if they're in the same building let alone on the same floor. in a perfect world all departments communicate well w/each other. however in the real world it is lacking.

"To claim that technical marketing didn't get this the first time around, seems like a hard-sell. We're pretty sure that NVIDIA engineers read reviews, and if they saw "64 ROPs" on a first-page table, they would have reported it up the food-chain at NVIDIA."

word on the street is the engineers were too busy watching kitty cat videos while eating cheetos.

"An explanation about this hardware change should have taken up an entire page in the technical documents the first time around, and NVIDIA could have saved itself a lot of explanation, much of it through the press."

yeah and i am surprised that technology journalists who have reported for years didn't see the asymmetrical design also. hopefully they will learn from nvidia's mistake as well.


edit: oh yeah HI, i am new :)
Welcome to TPU! Great first post.
Posted on Reply
#38
Parn
While the story of misunderstanding between NV's engineering and PR departments is hard to believe and NV should be taught a lesson for their questionable business practice, I doubt they will get into any big trouble for this other than a few consumers returning their GTX970.

If Intel could get away with the TSX errata found on Haswell CPUs (feature removed through microcode update after the bug was discovered by the community), I can't see why NV will not considering the performance of GTX970 hasn't changed a bit before or after this marketing fiasco.
Posted on Reply
#39
rruff
NightOfChristNVIDIA made a mistake with the design, intentionally or otherwise, so it is fair if people blame them for it.
Not a mistake in design... that was surely intentional, and the card performs well enough. It *needs* to be significantly slower than the 980. A mistake in marketing and presentation? Maybe.

I've worked in a few large corporations and marketing tends to live in their own little world of schmoozing and BS. But they wouldn't be allowed mod the specs without the consent of the top brass. The tech guys would likely have had no say in the matter at all.

The big question in my mind is... did everyone involved expect this scenario to play out like it is, or were they really dumb enough to think no one would notice? The ROPs and cache thing is really weird, because nobody would give a damn about those specs on their own. It's the gimped vram that is bothersome. Maybe they were thinking that would tip someone off that the architecture was funny and prompt them to investigate further? And they wanted the 970 to have a few months of "honeymoon" where it sold like crazy and forced AMD to slash prices on their 290 and 290x. Then they'd do the inevitable dance and damage control later.

I don't know... what would have happened if Nvidia had told everyone it was 3.5GB+ at the start? I can see marketing's point of view... you don't want to present your new hot product and put "gimped architecture weirdness" in everyone's mind. It definitely gives it one issue that makes it inferior to the competing AMD products, which have 4GB of vram. I'm certain it would have hurt sales and tainted reviews. But it's also hard to imagine that they'd prefer what is happening now. The 970 is still early in it's life cycle and if Nvidia compensates customers, that will certainly cost them more $ than if they were honest at the start.

Doesn't anybody know some one who works there and can tell us what's really going on?
Posted on Reply
#40
Stephen.
With all this info out in the open,.. It leads me to say that GTX980 is an overpriced card that does not deserve that price tag the GTX980 should have been at max $399.

Nvidia over priced GTX980 by publishing false GTX970 spec so that it appears to be close to GTX980 specs to justify the price they put on the GTX980.

If the real specs for the GTX970 were known from the start i bet the GTX980 would not be selling at those high prices it would be more of around $399 ish to justify the small performance gain it has over the GTX970
Posted on Reply
#41
Batou1986
I don't understand all the people hear yelling WELL ALL THE BENCHMARKS ARE STILL TRUE ETC ETC ETC.
In the future if someone running this card runs into a game that uses 4gb of memory and it runs slower because its using all 4gb of memory thats an issue, and issue that should have been clearly explained by Nvidia.

The fact is people bought a card that advertized 4gb of available vram, not a card with 3.5gb of vram and the possibility to use 4gb at reduced performance.
Posted on Reply
#42
Uplink10
Cheapness is going to be their doom. Disabling part of GPU, just to sell it lower and then selling fully enabled GPU at a higher price. Intel is the same but they at least get specifications right. Refunds are going to cost them greatly.

Quote from Transformers (2007):

00:18:53,432 --> 00:18:56,458
Wow. You are so cheap.
Posted on Reply
#43
FordGT90Concept
"I go fast!1!11!1!"
It doesn't look like there's a class action out yet but I'm positive it is coming. NVIDIA misrepresented their product. NVDA shares took a 3% dive today.
Posted on Reply
#44
NightOfChrist
Ferrum MasterNada, never blame the customers, they are fools maybe, but they didn't commit a crime in believing the spec sheet.

And 4K on 970? Where is the problem for older games? All UT3 games? Grid2? Civ? Source based games? And millions of users still for WoW?

Many customers who bought GTX 970 and expected no more than GTX 970 are blameless, it is true, but from their comments or rather complaints several customers expected the card's vRAM to perform like that of GTX 980 and they argued it is the same vRAM configuration utilised by GTX 980 so at least it should perform at the same level. There are obvious reasons why one is named 970 and the other 980. Although I did not expect there would be segmented 3.5+0.5 vRAM with the smaller segment slower than the rest, I did actually expect overall a 970 would be slower than a 980, despite the stats written on the sheet. I would blame them for being naive. Not all customers, of course. Just the naive ones.

As for older games on 4K, I do not think it is going to be a problem. Quite the contrary, it will be a great experience, and a single GTX 970 should suffice, if not more than suffice. But from what I have read some people bought a single GTX 970 and expected to run games like Assassin's Creed: Unity and even Dragon Age: Inquisition on 4K resolution. SLI setup, perhaps. But a single card?

And I apologise if my English is not fluent and understandable. I tried not to use a machine translator. Hopefully nobody would get confused by my poor choice of words.
Posted on Reply
#45
newtekie1
Semi-Retired Folder
So in the end, it all comes down to an incorrect amount of L2 cache given on a non-public spec sheet. And it matters so little, most reviews don't even mention the L2 cache size in the 970 reviews. It isn't mentioned in any of the 970 reviews here on TPU. That is how little L2 cache matters to the public.

They didn't lie about ROP count. The card has 64 Active ROPs. That is not a lie. It only uses 56 of them because using the others would actually slow the card down. But there are 64 active ROPs.

They didn't lie about memory amount, it has 4GB and all 4GB can be accessed if needed. The last 0.5GB is slower than the first 3.5GB, but so what? It is still faster than accessing system RAM. If they had designed the card as a strict 224-Bit 3.5GB card, it would have been slower than the 970 we got. There is no getting around that. They made the decisions they did with the extra 0.5GB because it improves the performance of the card.

Yeah, there is some marketing slight of hand going on here. But the fact is the card performs great, even at 4k.

Personally, I think they could have left all the specs the listed the same(except L2 size, but again I wouldn't even have listed that). But they should have given this explanation to the reviewers from the beginning, so they could include in their reviews how the memory subsystem works from the beginning.
Posted on Reply
#46
rruff
newtekie1But they should have given this explanation to the reviewers from the beginning, so they could include in their reviews how the memory subsystem works from the beginning.
Yes, but that isn't good for marketing, because the reviewer will be focused on the weird architecture. The press would have surely been less favorable if that had happened.

I'm guessing they really thought no one would notice, or at least not until later this year.
Posted on Reply
#47
Batou1986
newtekie1But they should have given this explanation to the reviewers from the beginning, so they could include in their reviews how the memory subsystem works from the beginning.
This is where the real issue lies, nvidia goes out of their way to explain all their technical features and stuff but some how skims over this part "accidentally", IMO Nvidia was intentionally misleading because they knew it would affect sales
Posted on Reply
#48
FordGT90Concept
"I go fast!1!11!1!"
newtekie1So in the end, it all comes down to an incorrect amount of L2 cache given on a non-public spec sheet. And it matters so little, most reviews don't even mention the L2 cache size in the 970 reviews. It isn't mentioned in any of the 970 reviews here on TPU. That is how little L2 cache matters to the public.

They didn't lie about ROP count. The card has 64 Active ROPs. That is not a lie. It only uses 56 of them because using the others would actually slow the card down. But there are 64 active ROPs.

They didn't lie about memory amount, it has 4GB and all 4GB can be accessed if needed. The last 0.5GB is slower than the first 3.5GB, but so what? It is still faster than accessing system RAM. If they had designed the card as a strict 224-Bit 3.5GB card, it would have been slower than the 970 we got. There is no getting around that. They made the decisions they did with the extra 0.5GB because it improves the performance of the card.

Yeah, there is some marketing slight of hand going on here. But the fact is the card performs great, even at 4k.

Personally, I think they could have left all the specs the listed the same(except L2 size, but again I wouldn't even have listed that). But they should have given this explanation to the reviewers from the beginning, so they could include in their reviews how the memory subsystem works from the beginning.
Those are misrepresentations. If a car manufacture sold cars advertising 4 wheels and after you buy it, you discover one of those is the spare, you'd be a little pissed too. Technically the manufacturer didn't lie, but they still misrepresented what they were selling.

"Active" matters little here. That's like super gluing a turbo charger on to the hood of a car and selling it as "turbo charged" when it is not. If the hardware is there but deliberately designed to not be used, it shouldn't be advertised as being there.
Posted on Reply
#49
v12dock
Block Caption of Rainey Street
FordGT90ConceptIt doesn't look like there's a class action out yet but I'm positive it is coming. NVIDIA misrepresented their product. NVDA shares took a 3% dive today.
On the flip side AMD is up 4% :laugh: One of the few companies that is not taking a massive hit today.
Posted on Reply
#50
TRWOV
ironwolfAny word from the AMD camp over this? I'd be curious if they might try to pull some PR stuff using this. Or if they will just keep their traps shut for the time being. :laugh:
Next headline on TPU:

Choose R9 290 Series for its uncompromised 4GB memory: AMD

:laugh::laugh::laugh::laugh:
Posted on Reply
Add your own comment
Jul 16th, 2024 13:04 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts