• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce RTX 50 Cards Spotted with Missing ROPs, NVIDIA Confirms the Issue, Multiple Vendors Affected

This must be much lower level than the driver.
This comes down to the binning process, where the appropriate lower quality ROPs are "fused off", I can only come up with two possible reasons why;
1) The affected GPUs are a lower bin, which for some reason was combined into the same SKU, and these missing ROPs are "defective".
2) The affected GPUs have 8 fully working ROPs fused off unintentionally, which makes the GPU avoid using these. A Nvidia engineer would be able to explain exactly how this works on a hardware level, but it's one of two;
a) Somehow "burned" into the chip, so no firmware update can change it.
b) Controlled by firmware, but in this case they should have fixed it instead of taking returns. (and now with multiple models affected…)
Either way it's not a driver issue.
Is there anything I've missed?

Also, every finalized graphics card is run through extensive validation by the AiB partners, it surprises me that none of those checks validates that the reported hardware matches the spec.


That would have to be done per country (or EU combined), and after years of deliberation and a settlement is reached, owners will get their ~$2 after lawyer fees.


And most will quickly return when they get burned there too…


Firstly, they do know the exact number, as this is a binning issue. But whether the reported figures are correct or not, I have my doubts, considering very few units are in use and users have a very low probability of detecting this, so my expectation would be that the real figure is in the ~10% range. (That's just a qualified guess, but don't quote me on that.)

It is however always hard for the public to gauge how widespread an issue may be, especially problems which may be tied to specific production batches, and a few people shouting loudly in the forums. A couple of generations ago Nvidia got a tremendous amount of flak for the "space invaders" defect on certain RTX 2080s, which in the end turned out to be an issue with EVGA. (Except for the random occurrences which is normal with mass produced graphics cards.) Outright failure rates with graphics cards are still very low compared to e.g. motherboards, and CPUs are even lower. So we have every reason to expect a graphics card to be fully working, and we should continue to hold vendors to that standard.

From what? 300w?
 
From what? 300w?
No, getting burned metaphorically, like most people do when they are dissatisfied and choose the competition the next time, only to experience some bigger annoyance and return back (since there are only two in this segment yet). There have been so many claiming to never buy Nvidia again, only to return shortly after. And the other team isn't better in these areas either, at least until now.

While no one should delight in others' misfortune, AMD should take this opportunity to improve their own quality in every area, not just some marketing BS and offer some models with slightly better value. And they should definitely ramp up production.

Buyers generally should wait and see which option is best for them. New generations are exciting, but I wouldn't buy either until they're mature. (I don't have time to be a beta tester.) And I'm not picking sides either, I've bought and recommended both many times, and I probably will buy one from each soon as I will upgrade two systems.
 
This must be much lower level than the driver.
This comes down to the binning process, where the appropriate lower quality ROPs are "fused off", I can only come up with two possible reasons why;
1) The affected GPUs are a lower bin, which for some reason was combined into the same SKU, and these missing ROPs are "defective".
2) The affected GPUs have 8 fully working ROPs fused off unintentionally, which makes the GPU avoid using these. A Nvidia engineer would be able to explain exactly how this works on a hardware level, but it's one of two;
a) Somehow "burned" into the chip, so no firmware update can change it.
b) Controlled by firmware, but in this case they should have fixed it instead of taking returns. (and now with multiple models affected…)
Either way it's not a driver issue.
Is there anything I've missed?

Also, every finalized graphics card is run through extensive validation by the AiB partners, it surprises me that none of those checks validates that the reported hardware matches the spec.


That would have to be done per country (or EU combined), and after years of deliberation and a settlement is reached, owners will get their ~$2 after lawyer fees.


And most will quickly return when they get burned there too…


Firstly, they do know the exact number, as this is a binning issue. But whether the reported figures are correct or not, I have my doubts, considering very few units are in use and users have a very low probability of detecting this, so my expectation would be that the real figure is in the ~10% range. (That's just a qualified guess, but don't quote me on that.)

It is however always hard for the public to gauge how widespread an issue may be, especially problems which may be tied to specific production batches, and a few people shouting loudly in the forums. A couple of generations ago Nvidia got a tremendous amount of flak for the "space invaders" defect on certain RTX 2080s, which in the end turned out to be an issue with EVGA. (Except for the random occurrences which is normal with mass produced graphics cards.) Outright failure rates with graphics cards are still very low compared to e.g. motherboards, and CPUs are even lower. So we have every reason to expect a graphics card to be fully working, and we should continue to hold vendors to that standard.
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.


Also ROPs are now decoupled from the memory controllers in later Nvidia GPUs, gaining several advantages: ROPs are now able to access the whole memory system instead of just their assigned memory controllers, and disabling ROPs does not cripple one of the memory controllers like what happened with the GTX 970 where disabled ROPs cut the effective memory pool from 4GB to 3.5GB.

Another guess of mine is that Nvidia's quality control did not adequately check ROPs before because earlier Blackwell GPUs were bound only for AI accelerators where ROPs are completely worthless before Nvidia started diverting some Blackwell GPUs to graphics duty. Quality control on ROPs is only needed for GPUs that are bound to perform graphics tasks. Nvidia could have botched the modifications to quality control needed to quality check GPUs that will be sold for graphics duty.
 
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.
That's a very plausible theory.
 
Untitled.png

Curious if there is any merit to this since I don't know where the information is from.
 
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.
Yes, I know, I've seen it.
Any such combination of disabled hardware is already included in (1); the GPU is actually a lower bin (somehow).

Another guess of mine is that Nvidia's quality control did not adequately check ROPs before because earlier Blackwell GPUs were bound only for AI accelerators where ROPs are completely worthless before Nvidia started diverting some Blackwell GPUs to graphics duty. Quality control on ROPs is only needed for GPUs that are bound to perform graphics tasks. Nvidia could have botched the modifications to quality control needed to quality check GPUs that will be sold for graphics duty.
If this were the case, the ROPs would be detected and in use, just causing random crashes instead.
The actual problem is at the binning stage.

…like what happened with the GTX 970 where disabled ROPs cut the effective memory pool from 4GB to 3.5GB.
That is actually incorrect.
GTX 970 had one memory controller disabled, making the last 0.5 GB sharing bandwidth with another controller at a lower priority, so the last 0.5 GB was always there, just a bit slower.
 
And yet it didn't stop you from buying 4070 Ti Super.
I had to buy it because damn it AMD doesn't have CUDA.

Even though years have passed, I miss ATI's aggressive policy. I would like to go back to the X1950XTX I was using back then.
 
I bet that was suppose to the 5090 originally and the 5090 we have now would be a Titan card.
I am on a similar line of thought, I think there is or was a planned SKU with these chips, and was a bin mix up at factory.
 
I had to buy it because damn it AMD doesn't have CUDA.

Even though years have passed, I miss ATI's aggressive policy. I would like to go back to the X1950XTX I was using back then.
ATI?....I have a 8500DV, with remote, in a P4B that will still do it's job if I turned it on...:)
 
TSMC strikes again? Sabotage? Espionage? Looking at Lisa Su and that dick ex Intel CEO... lol

Or accidental "D" version installs destined for Chinese cards? Seems fabs all over the place have had issues, Intel not long ago and AMD's shitty newer CPUs going thermonuclear..fun times!
 
Please explain do i also have missing ROPS on my Legacy GTX card if yes discovered it today.
1742142102165.png
 
Last edited:
TSMC strikes again? Sabotage? Espionage? Looking at Lisa Su and that dick ex Intel CEO... lol

Or accidental "D" version installs destined for Chinese cards? Seems fabs all over the place have had issues, Intel not long ago and AMD's shitty newer CPUs going thermonuclear..fun times!
I am laughin at your Sig
I walked away from more AMD based things than towards but the 8500DV was outstanding...cable went digital and tried to force me to use a box for each tuner...F*** Cable!:mad:
 
Back
Top