Friday, September 25th 2020
RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice
Igor's Lab has posted an interesting investigative article where he advances a possible reason for the recent crash to desktop problems for RTX 3080 owners. For one, Igor mentions how the launch timings were much tighter than usual, with NVIDIA AIB partners having much less time than would be adequate to prepare and thoroughly test their designs. One of the reasons this apparently happened was that NVIDIA released the compatible driver stack much later than usual for AIB partners; this meant that their actual testing and QA for produced RTX 3080 graphics cards was mostly limited to power on and voltage stability testing, other than actual gaming/graphics workload testing, which might have allowed for some less-than-stellar chip samples to be employed on some of the companies' OC products (which, with higher operating frequencies and consequent broadband frequency mixtures, hit the apparent 2 GHz frequency wall that produces the crash to desktop).
Another reason for this, according to Igor, is the actual "reference board" PG132 design, which is used as a reference, "Base Design" for partners to architecture their custom cards around. The thing here is that apparently NVIDIA's BOM left open choices in terms of power cleanup and regulation in the mounted capacitors. The Base Design features six mandatory capacitors for filtering high frequencies on the voltage rails (NVVDD and MSVDD). There are a number of choices for capacitors to be installed here, with varying levels of capability. POSCAPs (Conductive Polymer Tantalum Solid Capacitors) are generally worse than SP-CAPs (Conductive Polymer-Aluminium-Electrolytic-Capacitors) which are superseded in quality by MLCCs (Multilayer Ceramic Chip Capacitor, which have to be deployed in groups). Below is the circuitry arrangement employed below the BGA array where NVIDIA's GA-102 chip is seated, which corresponds to the central area on the back of the PCB.In the images below, you can see how NVIDIA and it's AIBs designed this regulator circuitry (NVIDIA Founders' Edition, MSI Gaming X, ZOTAC Trinity, and ASUS TUF Gaming OC in order, from our reviews' high resolution teardowns). NVIDIA in their Founders' Edition designs uses a hybrid capacitor deployment, with four SP-CAPs and two MLCC groups of 10 individual capacitors each in the center. MSI uses a single MLCC group in the central arrangement, with five SP-CAPs guaranteeing the rest of the cleanup duties. ZOTAC went the cheapest way (which may be one of the reasons their cards are also among the cheapest), with a six POSCAP design (which are worse than MLCCs, remember). ASUS, however, designed their TUF with six MLCC arrangements - there were no savings done in this power circuitry area.It's likely that the crash to desktop problems are related to both these issues - and this would also justify why some cards cease crashing when underclocked by 50-100 MHz, since at lower frequencies (and this will generally lead boost frequencies to stay below the 2 GHz mark) there is lesser broadband frequency mixture happening, which means POSCAP solutions can do their job - even if just barely.
Source:
Igor's Lab
Another reason for this, according to Igor, is the actual "reference board" PG132 design, which is used as a reference, "Base Design" for partners to architecture their custom cards around. The thing here is that apparently NVIDIA's BOM left open choices in terms of power cleanup and regulation in the mounted capacitors. The Base Design features six mandatory capacitors for filtering high frequencies on the voltage rails (NVVDD and MSVDD). There are a number of choices for capacitors to be installed here, with varying levels of capability. POSCAPs (Conductive Polymer Tantalum Solid Capacitors) are generally worse than SP-CAPs (Conductive Polymer-Aluminium-Electrolytic-Capacitors) which are superseded in quality by MLCCs (Multilayer Ceramic Chip Capacitor, which have to be deployed in groups). Below is the circuitry arrangement employed below the BGA array where NVIDIA's GA-102 chip is seated, which corresponds to the central area on the back of the PCB.In the images below, you can see how NVIDIA and it's AIBs designed this regulator circuitry (NVIDIA Founders' Edition, MSI Gaming X, ZOTAC Trinity, and ASUS TUF Gaming OC in order, from our reviews' high resolution teardowns). NVIDIA in their Founders' Edition designs uses a hybrid capacitor deployment, with four SP-CAPs and two MLCC groups of 10 individual capacitors each in the center. MSI uses a single MLCC group in the central arrangement, with five SP-CAPs guaranteeing the rest of the cleanup duties. ZOTAC went the cheapest way (which may be one of the reasons their cards are also among the cheapest), with a six POSCAP design (which are worse than MLCCs, remember). ASUS, however, designed their TUF with six MLCC arrangements - there were no savings done in this power circuitry area.It's likely that the crash to desktop problems are related to both these issues - and this would also justify why some cards cease crashing when underclocked by 50-100 MHz, since at lower frequencies (and this will generally lead boost frequencies to stay below the 2 GHz mark) there is lesser broadband frequency mixture happening, which means POSCAP solutions can do their job - even if just barely.
297 Comments on RTX 3080 Crash to Desktop Problems Likely Connected to AIB-Designed Capacitor Choice
Polymer tantalum electrolytic capacitors
www.vishay.com/docs/40254/t50.pdf (vPolyTan )
No I don't care if the cards in question accually use Panasonics line of Polymer tantalum electrolytic capacitors
Nor do I care what POSCAP stands for Or what particular formulation Panasonic is using
unilaterally declaring all Polymer tantalum electrolytic capacitors as POSCAP is misinformation at best and at worst illegal
POSCAP is a registered trademark of Panasonic. And in accordance with trademark law NOBODY else can refer to their product as 'POSCAP'
./thread
the next f***ing smooth brain that mentions it again is going to get a angry pussy thrown in there face
And now ladies and gentlemen, we return everyone to the regularly scheduled thread topic.
IMHO Evga statement is about last minute change in board design, which came up during all this CTD stuff. Their internal testing showed some issues with previous cap layout on their board and they made change for the final design. This is in no way a confirmation of "guesswork" done by "blogger".
Right now we get similar reports about CTDs on almost every partner design as well as on FEs. Bloggers can speculate whatever they want, take a look at jayz "I've just been told that there are multiple type of tantalum caps". Jayz not a freaking board engineer. He's probably on the same level as anyone on this forum that cares to do some serious research when it comes to power delivery on gpus.
Leave guesswork to people on forum and bloggers and YouTubers that are oriented on clicks and views. They will milk this for as long as they can.Ok, that was a bit harsh, I understand that IgorLab and Jayz want to inform their viewers about an issue with a newly released product. They're not doing it just for clicks and views. But they also get paid based on the type and the amount of content they produce.Engineers from Nvidia and AiB partners will do the actual work. In the end this will be either solved by driver, firmware or worst case scenario -
full blown recallbased on the result for each manufacturerRMA and exchange to v2 board design.Once again, all of this has to be investigated properly that people that get paid to do it.*
* and it wouldn't be needed if they'd have done it right in the first place.
Spend $800-1500 on a graphics card only to find out it's been cheaply made and needs to be sent back for RMA, taking several weeks to get a new one as there is limited stock.
I can't believe the stupidity of some spending best part of a grand of graphics card AIB I have honestly never heard of, like Ventus or Eagle? Crazy risk.
this kind of thing is fairly normal when you ride the bleeding edge of new hardware
FE/Ref/AIB cards have current sensing/balancing circuits for each power rail. Why does PEG exceed the 6.5A hard limit?
I do not care of any power delivery network analysis.
You should pay and get if you wish your opinion this to be taken seriously of one 100GHz Oscilloscope, so when NVIDIA shown
by saying that they have a fix, you to be able to verify it with measurements.
Regular gamers all that they care about this is the problem to stop appearing in their screen.
My advice ... if you are not part of the solution... then just make a step back.
You acted the same and about the 8846A in the past, you failed to repair it, and even still you are spreading misinformation about it. :banghead:
I will suggest again patience .. patience .. patience .. so the people who are responsible of their work them to deliver their decisions of what next to the buyers of RTX3000 series.
now here is my Opinion on what is going on
more then likely all that needs to happen in terms of hardware stability is the driver needs a more aggressive voltage table if you look at the voltages at load, they are all over the place and are dipping below 1000Mv which is just not enough voltage for 2000Mhz this issue is exasperated by some cards with poorer power designs we know from Turing that they get unstable at about 2000-2050Mz for most samples and they don't scale particularly well even with lots of voltage
this is a silicon limitation and likely to be worse on Ampere then turning because its a smaller\new process
whatever other problems any given pcb might have more voltage should help with stability it buys you more room to breath when the silicon is already operating at its outer limits which is really the problem is that AIBs want there pre overclocked 2000Mhz cards and the silicon quiet simply is not going todo that as easily as pervious generations
With some overpushed early chips clocking too high on a so far near the limit @ stock chip. Aka not one single issue. (kinda like zen2, only those don't boost more than they can handle)
And those cards will get a "fixed" bios update. Your Eagles, Strixes etc might be getting a slightly less OC variant bios.
Secondly, to the bolded, you must have been burying your head in the sand then or not bothered to look, but there are reports everywhere of FE CTD problems, this is clearly not relegated to just AIBs. Oh I see, 'cheap' models of these manufacturers.
www.techturtle.net/after-last-years-bendgate-its-now-chipgate-for-apple/
Few months ago the GTS 1660 Super this were demonstrated as fresh product option which as fresh one, the drivers they should gain greater performance and compatibility.
Now the RTX 3000 issue this will change NVIDIA's drivers developers focus at this direction.
In simple English thousands of people expectations they get on hold.