Wednesday, October 31st 2018

NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

Oct 31st, 2018 07:06 Discuss (127 Comments)

It has been been making the rounds now on various forum sites (including our own TPU) that problems have been cropping up for users of NVIDIA's Turing-based architecture graphics cards. The reports, which are increasing in number as awareness of the issue increases, vary in their manifestation, but have the same result: "crashes, black screens, blue screen of death issues, artifacts and cards that fail to work entirely," as reported by the original Digital Trends piece.

Of course, at the time, problems with the source for the information were too great to properly discern whether or not this issue stood beyond the usual launch issues and failures that can (and will happen) to any kind of hardware. The fact that people with negative experiences would always be more vocal than those without any problem; the fact that some accounts on the reported forums were of doubtful intent; and that the same user could be posting across multiple forums would always put a stop to any serious measurement of the issue. Now, though, NVIDIA has come out with a statement regarding the issue, which at least recognizes its existence.

Problems have been cropping up with both NVIDIA-made and AIB cards from various manufacturers, which seemingly rejects the possibility for manufacturer-based issues, and leaves on the table either an architectural or manufacturing batch issue (no confirmations yet). Let's hope this really is confined to a batch issue, though there have been multiple reports of users that got their cards RMA'd and then got one or two replacements that met ther same fate). The issue seems to be affecting owners of the flagship RTX 2080 Ti the most, though there are reports of 2080 models being affected as well.

In response, NVIDIA acknowledges the issue, but limits its relevance: as reported by Tom's Hardware, the company said that "it's not an increasing number of users" affected by this problem, saying "it's not broad." It then added that "we are working with each user individually like we do always." We're here to wait and see, but this definitely doesn't do any favors in grabbing more sales for the RTX 20-series, when the flagship graphics card costing over $1,000 fails on users.

Sources: GeeksULTD, via Tom's Hardware, GeForce Forums, GeForce Forums, Forbes, TechPowerupForums

Add your own comment

127 Comments on NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

#101

@man_daddio

RH92Issues can and will happen no matter the price point ( of course not ideal but hey we don't live in a perfect world ) the most important is how the company having those issues does handle them . As long as NVIDIA provides a decent service for those having issues there is no point to make a fuss about it.

I agree. There is always going to be a failure rate to any electronic item. Nvidia always stands by their products so they're not going to give you a hard time to anyone who has to RMA a product because they're not happy with it or it's not working properly. If Nvidia was like Apple denying they have problems at all it would be a different story. Third-party vendors are also usually good when there are any problems. I went with a Asus 2080ti card this time around but probably should have went with EVGA since I know that they never gave me a problem and always have good customer service.

#102

metalfiber

More components, more chance of failure. RTX 2080 Ti using some 2600 components compared to the GTX 1080 Ti at 1600.

www.techpowerup.com/248382/msi-talks-about-nvidia-supply-issues-us-trade-war-and-rtx-2080-ti-lightning

#103

Vayra86

John NaylorI remember what came next ....a lot of fear mongering and alarmism but reputable web sites were never able to reproduce the issue w/o doing some really weird things. Sure you could create an issue but it's one of those "well if i do this list of things in a certain sequence, i can cause an issue, especially at resolutions and settings which are inappropriate for the card(s) in question. And if the 3.5 GB was the problem, why did the 980 which has the full 4 GB exhibited the same behavior when exposed to the same sequences ? NVidia screwed the proverbial pooch froma PR PoV but that's where the issue ended.

www.guru3d.com/news-story/middle-earth-shadow-of-mordor-geforce-gtx-970-vram-stress-test,12.html

As far as 2xxx series problems, don't know if it's a fixable issue or not... to early to tell. But here's the deal ... if you want to be a "beta tester" and pay a premium to be the 1st one on your block with th enew shiny thing, then take the punches which will come when you make that choice. It happens with early steppings of CPUs, MoBos, GFX cards, SSDs ... you name it. The folks we built for rarely if ever got hammered by any of these because we have always recommended not investing in hardware that isn't a few steppings into production

a) pre B3 stepping P68 boards .. Intel chipset fail affected all brands ... industry wide recall
b) pre C1 stepping Asus ToG boards (external devices don't wake up from sleep. Asus said tough noogies
c) EVGA 970 early SC boards ... 1/3 of HD missed GPU ... EVGA said, yeah we designed it that way.
d) EVGA early SC / FTW 1060 - 1080 boards ... were going up in smoke because of missing thermal pads. Did the right thing gave owners thermal kitss that required 90 minutes of their time and effort to install
e) MSI tape adhesive on 900 series... users sometimes damaged fans taking off cause adhesive was too strong. Replaced the cards but owners still had he RMA hassle.

There's a reason it's called the bleeding edge ... early adopters have to expect to take a few punchers and will bleed a bit. Most manufacturers address them religiously ... some (i.e, Asus RoG line) often abandon their users (System time freeze bug, audio bugs, sleep bug for example) where they promise an upcoming fix that never arrives. I understand these folks frustration and wouldn't want to be there. The paid good money ... better said they overpaid because they were anxious and they deserved to get a functioning product. No doubt the vendors will make good with RMA replacements if a fix is not forthcoming. But they could have avoided this by making wiser choices. Show patience, loose the need to be the 1st on the block to impress ya friends and wait a bit. ..

a) You'll pay less
b) You will get a product in which bugs in early steppings will be history
c) Your likelihood of having to make repeated TS calls and deal with RMA is significantly less
d) You will likely see performance improvements as more mature production lines have better yields

I am a bit concerned as many users are sitting and waiting, putting off new builds till the 9xxx series CPUs, Z390 boards and RTX cards weed out their bugaboos. Normally, they'd be cutting loose right after the holidays. But on jan 1, us US folks will see tariffs tripe to 30% on electronics aso there's gonna be a huge crush on vendors to keep up with supply after the holidays.

In fact the rabbit hole was a little bit deeper than that.

Far Cry had visible stutter on 970 and several driver updates were needed to fix that. The stutter was not appearing on any other Maxwell card. Nvidia had to mitigate the effects of the memory setup, obviously, but that needed some tweaking. In SLI, the 970 is also more prone to stuttering than other 'full fat' solutions like the 980 or the 980ti. Something's gotta give, and we are now in a period of time where 4GB is the norm rather than the high end. These GPUs get obsolete faster. This is why everyone today will be seen recommending a 980 but not a 970 - the latter simply won't cut it anymore and the large price gap between the two has all but vanished.

Regardless, the point was about trust and business ethics and how that relates to this Nvidia statement. Not the end performance of the specific part. And in that aspect Nvidia took a fall with the 970, and rightly so. It was misleading advertising, we thought we got a full fat 256 bit 4GB, and we did not. Countering that with 'but performance was OK' is the weirdest kind of argumentation ever. If we lose a few GB/s on a new to be released 1060, we dó complain and worry about its impact on performance (check the recent announcement topic on the GDDR5X version of it). And there is an impact, simple enough. Numbers don't lie. Whether or not a driver can mitigate or 'hide' that impact is another discussion entirely, you're still not magically getting those GB/s back.

#104

Assimilator

hatNot even gonna question why the card was running so hot while idle, just call the guy stupid? Cool. I think it's stupid the card was allowed to get so hot while idle in the first place, "feature" or not. Should be some kind of failsafe there.

And whose stupidity is the overheating: NVIDIA's, or Asus's? I'mma give you a clue: who manufactured the card? And is this thread about an issue facing that company (in which case the post belongs here), or not (in which case it's just meaningless FUD obfuscating the actual issue)?

#105

EarthDog

pcmasterrace/comments/9tfach

#106

Xzibit

Nvidia is turning Steve into a Necrophiliac

#107

hat

Enthusiast

AssimilatorAnd whose stupidity is the overheating: NVIDIA's, or Asus's? I'mma give you a clue: who manufactured the card? And is this thread about an issue facing that company (in which case the post belongs here), or not (in which case it's just meaningless FUD obfuscating the actual issue)?

Whoever loaded it up with that "fans off when idle" feature that evidently doesn't care if the card is cooking... cause idle! I'm guessing that's an oversight over at ASUS. I don't think that's a standard feature.

#108

Vlada011

People with deep pocket should listen and follow more people with lower budget.
Because they more careful look what they buy, read more, estimate, search, wait to product show negative sides, etc...
But some people hurry like fly on sheet, hypnotized with advertising and this that someone will say WOW if they buy everything new.
Sometimes WOW could become Laugh, example now when even perfect product can't justify such prices.
Now I'm more jealous on someone why pay GTX1080Ti for lower price than people who spend 1500$ on premium RTX2080Ti as Galaxy, K|NGP|N etc...

#109

RealNeil

It's a weird situation, but I don't think people will be screwed over the long term. NVIDIA has a good record of fixing their screw-ups, if indeed this is even a screw-up at all.
A wait and see is in order (and being glad that I decided that these new cards were too expensive for me) and they're probably well on the way to a resolution.

#110

Assimilator

www.gamersnexus.net/industry/3387-hw-news-dying-2080ti-investigation-1080ti-stock-almost-gone

Talking to all of our board partner contacts off-record, none of them have reported higher RMA requests than normally. We trust our contacts on these and spoke with nearly everyone in the market. The most common RMA reasons haven’t changed from previous generations, and actual RMA rate is exceptionally low right now. Some board partners are at under 0.01%, which is just because these devices are so new that no one has even had a chance to encounter serious problems yet.

The problem seems to me somewhat concentrated to FE RTX 2080 Ti models, but there have been users with Gigabyte and ASUS cards who have come forward with similar issues.

Speaking with two of our SI contacts, we heard similar responses: Neither company has seen abnormal RMAs for these devices.

...

Thus far, our reddit thread has garnered about 5 dead 2080 Ti samples, at time of this video going up.

As I expected: FUD, pure and simple.

#111

Xzibit

Assimilatorwww.gamersnexus.net/industry/3387-hw-news-dying-2080ti-investigation-1080ti-stock-almost-gone

As I expected: FUD, pure and simple.

Compare it to what TechSpot/HWU said that the S.I. they talked to had 3 out of 9 go bad. In order meet that 0.01% their supplier would have to have ordered 30,000 cards with none of the others going bad.

#112

John Naylor

Vayra86In fact the rabbit hole was a little bit deeper than that.

Far Cry had visible stutter on 970 and several driver updates were needed to fix that. The stutter was not appearing on any other Maxwell card. Nvidia had to mitigate the effects of the memory setup, obviously, but that needed some tweaking. In SLI, the 970 is also more prone to stuttering than other 'full fat' solutions like the 980 or the 980ti. Something's gotta give, and we are now in a period of time where 4GB is the norm rather than the high end. These GPUs get obsolete faster. This is why everyone today will be seen recommending a 980 but not a 970 - the latter simply won't cut it anymore and the large price gap between the two has all but vanished.

Regardless, the point was about trust and business ethics and how that relates to this Nvidia statement. Not the end performance of the specific part. And in that aspect Nvidia took a fall with the 970, and rightly so. It was misleading advertising, we thought we got a full fat 256 bit 4GB, and we did not. Countering that with 'but performance was OK' is the weirdest kind of argumentation ever. If we lose a few GB/s on a new to be released 1060, we dó complain and worry about its impact on performance (check the recent announcement topic on the GDDR5X version of it). And there is an impact, simple enough. Numbers don't lie. Whether or not a driver can mitigate or 'hide' that impact is another discussion entirely, you're still not magically getting those GB/s back.

I had no issues with any 970s on any games including Far Cry ... have two SLI 970s boxes here and many more builds, single and SLI with no reported problems. I do agree that nVidia screwed the pooch with the PR and the way they handled it, the problem was the performance was too darn close to the 980 and they needed some way to nerf it. Doesn't appear they thought that one thru. It was misleading but the fact was, any problem you could make on the 970, you could also make with the 980.

Too many folks just don't understand what their utilities are capable of.... they download a utility recommended by someone on the internet and use it without understanding what does. And yes the fact remaoins we still have folks screaming that more VRAM is needed solely because their utility is misinforming them. As Mr. Inigo Montoya said so well in Princess Bride "I don't think that word means what you think it means".

And no, there is no utility in existence that measures VRAM usage ....

www.extremetech.com/gaming/213069-is-4gb-of-vram-enough-amds-fury-x-faces-off-with-nvidias-gtx-980-ti-titan-x

I was once offered as proof the TPU results for the 1060 3GB and 6GB models ... "see it's 6% faster". But the reason the 6 GB is faster is because it has 11% more shaders. So it is 6% faster at 1080p... and if the reason was VRAM, then we should have seen the gap widen at 1440p but it doesn't ... same 6%.

Again, I am not excusing nVidia's PR response to the ~~problem~~ ... er issue ... but the fact remains, any problem you can create o the 980, you could create on the 980. But we have seen this kind of response many times ... EVGA on the 970 SC where their excuse for 1/3 the heat sink missing the GPU was intentional. Asrock for bulging caps and broken boards, Intel for the giant P68 pre-B3 recall. Today it's become a modus operandi and no one does it better than the "alternative facts" crowd. But one side shoveling the stuff does not excuse the other side from the same behavior. The excuses were BS but so was the imaginary problem ... had their been a real problem, like P68 B3, there would have been a recall.

#113

HuLkY

Der8auer comments on the ongoing event.

#114

Assimilator

HuLkYDer8auer comments on the ongoing event.

tl;dr pretty much what the link I posted said. Real RMA numbers from real distributors show zero evidence of excessive RMA rates for RTX SKUs. As der8auer says, journalists - including TPU - should be careful of parroting rumours for the sake of clicks; it's basic journalistic responsibility to check facts before you decide to publish.

#115

R0H1T

Assimilatortl;dr pretty much what the link I posted said. Real RMA numbers from real distributors show zero evidence of excessive RMA rates for RTX SKUs. As der8auer says, journalists - including TPU - should be careful of parroting rumours for the sake of clicks; it's basic journalistic responsibility to check facts before you decide to publish.

I wouldn't trust rumors, just yet, but there's an obvious conflict of interest here.

Right let's ask Nvidia, oh wait :wtf:

#116

John Naylor

Saw this on nVidia Forums ... no clue as to where it came from as poster didn't list

"I have info which seems to confirm for now, defective batches starts on - 0323xxx, those that have long and healthy working so far - 0333xxx. "

Of course it was followed by I have an 0344 and :) ... Does seem that the issue is primarily with FE cards. I expect it will take a week or two to nail down the problem and address it ... assuming of course that it is not batch related.

#117

Xzibit

Oh great.

TechSpot - Researchers show Nvidia GPUs can be vulnerable to side channel attacks

UCRWe extend this attack to track user activities as they interact with a website or type characters on a keyboard. We can accurately track re-rendering events on GPU and measure the timing of keystrokes as they type characters in a textbox (e.g., a password box), making it possible to carry out keystroke timing analysis to infer the characters being typed by the user. A second attack uses a CUDA spy to infer the internal structure of a neural network application from the Rodinia benchmark, demonstrating that these attacks are also dangerous on the cloud. We believe that this class of attacks represents a substantial new threat targeting sensitive GPU-accelerated computational (e.g. deep neural networks) and graphics (e.g. web browsers) workloads.

Looks like GamersNexus made progress

GNTesting the first defective 2080 Ti right now. We have been able to replicate some issues reported thus far, like flickering and "random" crashes to desktop. We have a lot more work to do. Have not replicated the artifacts yet.

GNAs of today, we have successfully reproduced 2 modes of failure on the RTX 20-series cards that were sent in.

#118

FreedomEclipse

~Technological Technocrat~

Video touches on a few issues that is more or less mainly driver and monitor compatibility issues especially with G-Sync or high refresh rate monitors that cause BSODs. BSODs have also been easily reproducible on multi monitor setups...

There are hardware issues present but this is how far Steve has got during their tests so far. This video is just part one. Im guessing part two is taking a closer look at the dead/defective cards with the help of buildzoid and buildzoid is a God when it comes to teardowns right to the component level.

#119

Xzibit

Looks like HardOCP got one of the bad ones.

HardOCP - GeForce RTX 2080 Ti FAILS After Gaming for 2 Hours

#120

Assimilator

XzibitLooks like HardOCP got one of the bad ones.

HardOCP - GeForce RTX 2080 Ti FAILS After Gaming for 2 Hours

Seems like it's the FEs and/or cards using the reference PCB. Extremely embarrassing, and expensive, for NVIDIA if that's the case.

#121

EarthDog

Lol, FUD perpetutation.

His one card died... and it's now an issue to them. LOL great journalism. Another thrown into the pile of click bait tech sites. :(

Still waiting for my luck to run out.... 2 high hz gsync monitors and an 2080Ti FE....pray for me TPU. :)

#122

LiveOrDie

AssimilatorSomeone who has enough dollars to afford a RTX 2080, yet not enough sense to know about the "fan off in idle" feature that has existed on graphics cards for years. Amazing.

Huh? i know the fans turn off on idle its a feature of all the dual asus cards, are you retarded my point was it cooked it self on idle which it mite of not have if the fans where running or the temps where normal, fk me some people....

Add your own comment

NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

127 Comments on NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

Related News

127 Comments on NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts