Wednesday, October 31st 2018

NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

It has been been making the rounds now on various forum sites (including our own TPU) that problems have been cropping up for users of NVIDIA's Turing-based architecture graphics cards. The reports, which are increasing in number as awareness of the issue increases, vary in their manifestation, but have the same result: "crashes, black screens, blue screen of death issues, artifacts and cards that fail to work entirely," as reported by the original Digital Trends piece.

Of course, at the time, problems with the source for the information were too great to properly discern whether or not this issue stood beyond the usual launch issues and failures that can (and will happen) to any kind of hardware. The fact that people with negative experiences would always be more vocal than those without any problem; the fact that some accounts on the reported forums were of doubtful intent; and that the same user could be posting across multiple forums would always put a stop to any serious measurement of the issue. Now, though, NVIDIA has come out with a statement regarding the issue, which at least recognizes its existence.
Problems have been cropping up with both NVIDIA-made and AIB cards from various manufacturers, which seemingly rejects the possibility for manufacturer-based issues, and leaves on the table either an architectural or manufacturing batch issue (no confirmations yet). Let's hope this really is confined to a batch issue, though there have been multiple reports of users that got their cards RMA'd and then got one or two replacements that met ther same fate). The issue seems to be affecting owners of the flagship RTX 2080 Ti the most, though there are reports of 2080 models being affected as well.
In response, NVIDIA acknowledges the issue, but limits its relevance: as reported by Tom's Hardware, the company said that "it's not an increasing number of users" affected by this problem, saying "it's not broad." It then added that "we are working with each user individually like we do always." We're here to wait and see, but this definitely doesn't do any favors in grabbing more sales for the RTX 20-series, when the flagship graphics card costing over $1,000 fails on users.
Sources: GeeksULTD, via Tom's Hardware, GeForce Forums, GeForce Forums, Forbes, TechPowerupForums
Add your own comment

127 Comments on NVIDIA Confirms Issues Cropping Up With Turing-based Cards, "It's Not a Broad Issue"

#76
jaggerwild
A $1200 fuck up , period!!! Shouldnt be happening, but when your an Early adapoter........
Posted on Reply
#77
unikin
hat"It's never a huge issue... until it is"

Fact: users with issues are often very vocal, contrary to users who just bought their shit, didn't have issues and lived happily ever after
Fact: we don't know how many bad cards are out there compared to good cards
Fact: we don't know how many good cards are out there that will suffer an untimely failure

So, it's too early to tell whether it's a massive issue or not. All we can do is speculate. I speculate that nVidia wouldn't push a large number of bad cards out the door. That would lead to a lot of unhappy customers, bad PR, additional costs involved with all those RMAs... it wouldn't be very wise.
NVidia never meant to produce a lot of RTX 2080 TI cards to begin with. I work in car industry and from my experience, when supplier is given order to produce millions of particular part, number of defective units will be small and fall even lower over time (to 2-3 parts per million). On the other hand, if you task supplier with producing only few 100.000s parts, you'll be in trouble as quality deteriorates substantially. Given complexity of Nvidia's large dies and low volumes, it doesn't surprise me they're having hard time achieving high quality production lines with sustainable production costs.
Posted on Reply
#78
mcraygsx
EarthDogTPU members, I present to you, a voice of reason.

Thank you, thank you Hat!!!!!
My point being the larger GPU require more complex manufacturing process thus probability of a defect increases. Yield to die size ratio is not proportional.
That is why new GPU start off with smaller nodes e.g. 2080/2070. In this case they rushed 2080 Ti on released date unlike previous generations.
I am sorry if my FERMI joke triggered :laugh: you this bad. But staying on topic it seems like quality assurance really lacked this time around.
Posted on Reply
#79
EarthDog
Triggered? Naa holmes, just correcting information (that you tried to assert twice is correct) is all.
Posted on Reply
#80
EarthDog
Not sure why you keep posting man... its all good here! Wrongs were righted and all is well in the World. :pimp:






LOL, you deleted your posts trying to defend your point......... LOLOLOL
Posted on Reply
#81
R0H1T
No you don't have to ask, RTX is just a card ~ even if it is bad, it's bad for anyone who's bought it btw everyone else don't need to have an opinion of it.
Posted on Reply
#82
95Viper
Get back on topic, please.
No more, back and forth, retaliatory comments.
Thank You.
Posted on Reply
#83
Xzibit
unikinHardware Unboxed just reported that one of their RTX 2080 TI just died. This is not normal deficiency rate, given that only few thousand RTX 2080 TI has been sold by now:

NVidia will of course deny everything. That's what their corp PR department is for. Never admit a problem until proven wrong and even then minimize it's importance. That's how corporations play the game, Nvidia is no different. Their responsability lies with shareholders not consumers.
3 out of 9 are bad for System Integrator, Ouch
Posted on Reply
#84
RealNeil
jaggerwildA $1200 **ck up , period!!! Shouldn't be happening, but when you're an early adapter........
Is this why they call it the bleeding edge and not the leading edge?
INSTG8RThis shouldn’t be happening at all, certainly not at its price point.
This, most definitely this.
For the money outlay, they should be perfect.
I was choking on the prices of these things anyway.
Maybe it's bad Joss, or maybe, (just maybe) it's Karma slapping them around for their over the top greed. :nutkick:
Posted on Reply
#85
hat
Enthusiast
jaggerwildA $1200 fuck up , period!!! Shouldnt be happening, but when your an Early adapoter........
That's why there's terms like "early adopter tax". Not only are you paying big for the product, but you're also getting it before any improvements are made... that means neat bonuses like the G0 stepping Q6600s or D0 i7 920s, for example, or seemingly glaring issues like this. I'm still not sure it's as bad as it's made out to be, though. The truth exists somewhere among sensationalism, fanboys, potentially bad journalism and small (or just bad) sample size.
unikinNVidia never meant to produce a lot of RTX 2080 TI cards to begin with. I work in car industry and from my experience, when supplier is given order to produce millions of particular part, number of defective units will be small and fall even lower over time (to 2-3 parts per million). On the other hand, if you task supplier with producing only few 100.000s parts, you'll be in trouble as quality deteriorates substantially. Given complexity of Nvidia's large dies and low volumes, it doesn't surprise me they're having hard time achieving high quality production lines with sustainable production costs.
I work in a plastics plant (also manufacturing) and I could tell you different. However, you can't apply the same line of thinking to laundry baskets, car parts and video cards...

These chips are some of the biggest ever, which is why yields are poor. I'm sure the process can improve over time, but that doesn't excuse an unreasonably large number of bad parts being shipped out (if that is indeed the case).
RealNeilThis, most definitely this.
For the money outlay, they should be perfect.
I was choking on the prices of these things anyway.
Maybe it's bad Joss, or maybe, (just maybe) it's Karma slapping them around for their over the top greed. :nutkick:
2080Ti are more susceptible to failure than cheaper cards because they're more complex. As I said above, though, that still doesn't excuse poor QC.
Posted on Reply
#86
Assimilator
This thread is giving me cancer; someone even used the term "nvidiot" unironically. I'm more concerned, however, by the elaborate-but-nonsensical conspiracy theories being slung around, especially the "NVIDIA knew this would happen" ones. In what universe does it ever benefit NVIDIA to release known defective products? Especially when they have no competition at that level and therefore no need to rush said products' release? Honestly.

Unless you enjoy showing your irrational hate of "the enemy" NVIDIA as well as your deficient IQs, how's about y'all quiet down and wait for more concrete facts to be released before making fools of yourselves.
Posted on Reply
#87
HuLkY
That's sad news, so if someone wants to buy now the best choice is to wait for this storm to pass? maybe nVidia will release a new batch with a permanent fix for this?
Posted on Reply
#88
medi01
SlizzoHere's the thing though; knowing about the memory configuration did nothing to change how the card performed. And the card DID have 4gb of memory on it. Generally the only way that people were able to figure out the issue were by, usually, artificially loading up the memory to cause it to switch over to slow mode.
There is a reason they were forced to pay 970 customers after losing class lawsuit.
Posted on Reply
#89
Totally
RH92Issues can and will happen no matter the price point ( of course not ideal but hey we don't live in a perfect world ) the most important is how the company having those issues does handle them . As long as NVIDIA provides a decent service for those having issues there is no point to make a fuss about it.
May I ask what previous instances of new cards dying shortly after moderate use?
Posted on Reply
#90
Aquinus
Resident Wat-man
Sounds like a bad batch of GDDR6 that should have been caught by QA.
Posted on Reply
#91
zenlaserman
AquinusSounds like a bad batch of GDDR6 that should have been caught by QA.
...possibly exacerbated by the fact that a lot of people are overclocking it. First gen of new graphics memory tends to not have much OC headroom, esp long-term.

It appears that the GDDR6 on most of these cards is specified at 1750MHz and that's what they are already clocked at out of the box.
Posted on Reply
#93
Prima.Vera
Sorry, but when you ask more than 1000$ for a RTX 2080 card and way more for the "Ti", there is absolutelly NO EXCUSE to release a faulty product. Considering how much they cost, the testing of those cards should have been excessive to say the least.
Posted on Reply
#94
LiveOrDie
"crashes, black screens, blue screen of death issues, artifacts" Yep got all of these plus lockups with driver crashes this was on a Asus 2080 OC, the card cooked it self on idle the fans never turned on only when the card was underload, the backplate i could of cooked eggs on, i returned it and i wont be getting other one.
Posted on Reply
#95
RealNeil
hat2080Ti are more susceptible to failure than cheaper cards because they're more complex. As I said above, though, that still doesn't excuse poor QC.
I don't think that it's the complexity of the platform that's at issue here. GPUs have been complex for years.
QC may not be entirely to blame either. Quality Control needs time to properly test in all possible scenarios, tossing variables into the mix and observing the results. This takes a lot of time.

With this happening with the 2080s and 2080Ti cards, I'm wondering if they tested as thoroughly as they ~could~ have.
Companies are getting into a rush to market mentality akin to throwing products on the wall to see what sticks. They're more than happy to fix the screw-ups, but this comes at the expense of customer inconvenience. (and possible downtime)

People are getting pissed off that for such a huge outlay of money, there are any issues at all. There shouldn't be. Lots of folks sell off their existing premium hardware to partially fund buying the shiny, new stuff that we (think we) want. Then, we have nothing to fall back to when unnecessary crap like this crops up.

I'm glad that I didn't buy into 20 series cards yet.
Posted on Reply
#96
hat
Enthusiast
I still call that poor QC, even if it isn't the quality department's fault because the big wigs are rushing shit out the door. $120 part or $1200 part doesn't matter, bad is bad... that is if the issue really is as widespread as some make it to be.
Posted on Reply
#97
Assimilator
Live OR Die"crashes, black screens, blue screen of death issues, artifacts" Yep got all of these plus lockups with driver crashes this was on a Asus 2080 OC, the card cooked it self on idle the fans never turned on only when the card was underload, the backplate i could of cooked eggs on, i returned it and i wont be getting other one.
Someone who has enough dollars to afford a RTX 2080, yet not enough sense to know about the "fan off in idle" feature that has existed on graphics cards for years. Amazing.
Posted on Reply
#98
hat
Enthusiast
Not even gonna question why the card was running so hot while idle, just call the guy stupid? Cool. I think it's stupid the card was allowed to get so hot while idle in the first place, "feature" or not. Should be some kind of failsafe there.
Posted on Reply
#99
RealNeil
"It's not a broad Issue"


But this is,.....
Posted on Reply
#100
John Naylor
Vayra86I remember a similar response when they 'forgot' to tell their marketing team that the 970 had only 3.5 GB of useful VRAM.

'Its not a big issue, we forgot some insignificant detail, move on pls'

We all know what came next. That said, this is not the same kind of problem of course, but it underlines why you can use a bag of salt when reading this article.
I remember what came next ....a lot of fear mongering and alarmism but reputable web sites were never able to reproduce the issue w/o doing some really weird things. Sure you could create an issue but it's one of those "well if i do this list of things in a certain sequence, i can cause an issue, especially at resolutions and settings which are inappropriate for the card(s) in question. And if the 3.5 GB was the problem, why did the 980 which has the full 4 GB exhibited the same behavior when exposed to the same sequences ? NVidia screwed the proverbial pooch froma PR PoV but that's where the issue ended.

www.guru3d.com/news-story/middle-earth-shadow-of-mordor-geforce-gtx-970-vram-stress-test,12.html
Thing is, the quantifying fact is that nobody really has massive issues, dozens and dozens of media have tested the card with in-depth reviews like the ones here on my site. Replicating the stutters and stuff you see in some of the video's, well to date I have not been able to reproduce them unless you do crazy stuff, and I've been on this all weekend.

Let me clearly state this, the GTX 970 is not an Ultra HD card, it has never been marketed as such and we never recommended even a GTX 980 for Ultra HD gaming either. So if you start looking at that resolution and zoom in, then of course you are bound to run into performance issues, but so does the GTX 980. These cards are still too weak for such a resolution combined with proper image quality settings. Remember, Ultra HD = 4x 1080P. Let me quote myself from my GTX 970 conclusions “it is a little beast for Full HD and WHQD gaming combined with the best image quality settings”, and within that context I really think it is valid to stick to a maximum of 2560x1440 as 1080P and 1440P are is the real domain for these cards. Face it, if you planned to game at Ultra HD, you would not buy a GeForce GTX 970.
As far as 2xxx series problems, don't know if it's a fixable issue or not... to early to tell. But here's the deal ... if you want to be a "beta tester" and pay a premium to be the 1st one on your block with th enew shiny thing, then take the punches which will come when you make that choice. It happens with early steppings of CPUs, MoBos, GFX cards, SSDs ... you name it. The folks we built for rarely if ever got hammered by any of these because we have always recommended not investing in hardware that isn't a few steppings into production

a) pre B3 stepping P68 boards .. Intel chipset fail affected all brands ... industry wide recall
b) pre C1 stepping Asus ToG boards (external devices don't wake up from sleep. Asus said tough noogies
c) EVGA 970 early SC boards ... 1/3 of HD missed GPU ... EVGA said, yeah we designed it that way.
d) EVGA early SC / FTW 1060 - 1080 boards ... were going up in smoke because of missing thermal pads. Did the right thing gave owners thermal kitss that required 90 minutes of their time and effort to install
e) MSI tape adhesive on 900 series... users sometimes damaged fans taking off cause adhesive was too strong. Replaced the cards but owners still had he RMA hassle.

There's a reason it's called the bleeding edge ... early adopters have to expect to take a few punchers and will bleed a bit. Most manufacturers address them religiously ... some (i.e, Asus RoG line) often abandon their users (System time freeze bug, audio bugs, sleep bug for example) where they promise an upcoming fix that never arrives. I understand these folks frustration and wouldn't want to be there. The paid good money ... better said they overpaid because they were anxious and they deserved to get a functioning product. No doubt the vendors will make good with RMA replacements if a fix is not forthcoming. But they could have avoided this by making wiser choices. Show patience, loose the need to be the 1st on the block to impress ya friends and wait a bit. ..

a) You'll pay less
b) You will get a product in which bugs in early steppings will be history
c) Your likelihood of having to make repeated TS calls and deal with RMA is significantly less
d) You will likely see performance improvements as more mature production lines have better yields

I am a bit concerned as many users are sitting and waiting, putting off new builds till the 9xxx series CPUs, Z390 boards and RTX cards weed out their bugaboos. Normally, they'd be cutting loose right after the holidays. But on jan 1, us US folks will see tariffs tripe to 30% on electronics aso there's gonna be a huge crush on vendors to keep up with supply after the holidays.
Posted on Reply
Add your own comment
Dec 16th, 2024 08:52 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts