• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce RTX 50 Cards Spotted with Missing ROPs, NVIDIA Confirms the Issue, Multiple Vendors Affected

MxPhenom 216

ASIC Engineer
Joined
Aug 31, 2010
Messages
13,116 (2.47/day)
Location
Loveland, CO
System Name Ryzen Reflection
Processor AMD Ryzen 9 5900x
Motherboard Gigabyte X570S Aorus Master
Cooling 2x EK PE360 | TechN AM4 AMD Block Black | EK Quantum Vector Trinity GPU Nickel + Plexi
Memory Teamgroup T-Force Xtreem 2x16GB B-Die 3600 @ 14-14-14-28-42-288-2T 1.45v
Video Card(s) Zotac AMP HoloBlack RTX 3080Ti 12G | 950mV 1950Mhz
Storage WD SN850 500GB (OS) | Samsung 980 Pro 1TB (Games_1) | Samsung 970 Evo 1TB (Games_2)
Display(s) Asus XG27AQM 240Hz G-Sync Fast-IPS | Gigabyte M27Q-P 165Hz 1440P IPS | LG 24" IPS 1440p
Case Lian Li PC-011D XL | Custom cables by Cablemodz
Audio Device(s) FiiO K7 | Sennheiser HD650 + Beyerdynamic FOX Mic
Power Supply Seasonic Prime Ultra Platinum 850
Mouse Razer Viper v2 Pro
Keyboard Corsair K65 Plus 75% Wireless - USB Mode
Software Windows 11 Pro 64-Bit
This must be much lower level than the driver.
This comes down to the binning process, where the appropriate lower quality ROPs are "fused off", I can only come up with two possible reasons why;
1) The affected GPUs are a lower bin, which for some reason was combined into the same SKU, and these missing ROPs are "defective".
2) The affected GPUs have 8 fully working ROPs fused off unintentionally, which makes the GPU avoid using these. A Nvidia engineer would be able to explain exactly how this works on a hardware level, but it's one of two;
a) Somehow "burned" into the chip, so no firmware update can change it.
b) Controlled by firmware, but in this case they should have fixed it instead of taking returns. (and now with multiple models affected…)
Either way it's not a driver issue.
Is there anything I've missed?

Also, every finalized graphics card is run through extensive validation by the AiB partners, it surprises me that none of those checks validates that the reported hardware matches the spec.


That would have to be done per country (or EU combined), and after years of deliberation and a settlement is reached, owners will get their ~$2 after lawyer fees.


And most will quickly return when they get burned there too…


Firstly, they do know the exact number, as this is a binning issue. But whether the reported figures are correct or not, I have my doubts, considering very few units are in use and users have a very low probability of detecting this, so my expectation would be that the real figure is in the ~10% range. (That's just a qualified guess, but don't quote me on that.)

It is however always hard for the public to gauge how widespread an issue may be, especially problems which may be tied to specific production batches, and a few people shouting loudly in the forums. A couple of generations ago Nvidia got a tremendous amount of flak for the "space invaders" defect on certain RTX 2080s, which in the end turned out to be an issue with EVGA. (Except for the random occurrences which is normal with mass produced graphics cards.) Outright failure rates with graphics cards are still very low compared to e.g. motherboards, and CPUs are even lower. So we have every reason to expect a graphics card to be fully working, and we should continue to hold vendors to that standard.

From what? 300w?
 
Joined
Jun 10, 2014
Messages
3,055 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
From what? 300w?
No, getting burned metaphorically, like most people do when they are dissatisfied and choose the competition the next time, only to experience some bigger annoyance and return back (since there are only two in this segment yet). There have been so many claiming to never buy Nvidia again, only to return shortly after. And the other team isn't better in these areas either, at least until now.

While no one should delight in others' misfortune, AMD should take this opportunity to improve their own quality in every area, not just some marketing BS and offer some models with slightly better value. And they should definitely ramp up production.

Buyers generally should wait and see which option is best for them. New generations are exciting, but I wouldn't buy either until they're mature. (I don't have time to be a beta tester.) And I'm not picking sides either, I've bought and recommended both many times, and I probably will buy one from each soon as I will upgrade two systems.
 
Joined
Jan 25, 2025
Messages
44 (1.07/day)
Location
Morrisville, NC, USA
This must be much lower level than the driver.
This comes down to the binning process, where the appropriate lower quality ROPs are "fused off", I can only come up with two possible reasons why;
1) The affected GPUs are a lower bin, which for some reason was combined into the same SKU, and these missing ROPs are "defective".
2) The affected GPUs have 8 fully working ROPs fused off unintentionally, which makes the GPU avoid using these. A Nvidia engineer would be able to explain exactly how this works on a hardware level, but it's one of two;
a) Somehow "burned" into the chip, so no firmware update can change it.
b) Controlled by firmware, but in this case they should have fixed it instead of taking returns. (and now with multiple models affected…)
Either way it's not a driver issue.
Is there anything I've missed?

Also, every finalized graphics card is run through extensive validation by the AiB partners, it surprises me that none of those checks validates that the reported hardware matches the spec.


That would have to be done per country (or EU combined), and after years of deliberation and a settlement is reached, owners will get their ~$2 after lawyer fees.


And most will quickly return when they get burned there too…


Firstly, they do know the exact number, as this is a binning issue. But whether the reported figures are correct or not, I have my doubts, considering very few units are in use and users have a very low probability of detecting this, so my expectation would be that the real figure is in the ~10% range. (That's just a qualified guess, but don't quote me on that.)

It is however always hard for the public to gauge how widespread an issue may be, especially problems which may be tied to specific production batches, and a few people shouting loudly in the forums. A couple of generations ago Nvidia got a tremendous amount of flak for the "space invaders" defect on certain RTX 2080s, which in the end turned out to be an issue with EVGA. (Except for the random occurrences which is normal with mass produced graphics cards.) Outright failure rates with graphics cards are still very low compared to e.g. motherboards, and CPUs are even lower. So we have every reason to expect a graphics card to be fully working, and we should continue to hold vendors to that standard.
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.


Also ROPs are now decoupled from the memory controllers in later Nvidia GPUs, gaining several advantages: ROPs are now able to access the whole memory system instead of just their assigned memory controllers, and disabling ROPs does not cripple one of the memory controllers like what happened with the GTX 970 where disabled ROPs cut the effective memory pool from 4GB to 3.5GB.

Another guess of mine is that Nvidia's quality control did not adequately check ROPs before because earlier Blackwell GPUs were bound only for AI accelerators where ROPs are completely worthless before Nvidia started diverting some Blackwell GPUs to graphics duty. Quality control on ROPs is only needed for GPUs that are bound to perform graphics tasks. Nvidia could have botched the modifications to quality control needed to quality check GPUs that will be sold for graphics duty.
 
Joined
Jul 5, 2013
Messages
29,519 (6.92/day)
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.
That's a very plausible theory.
 
Joined
Mar 23, 2016
Messages
4,880 (1.49/day)
Processor Core i7-13700
Motherboard MSI Z790 Gaming Plus WiFi
Cooling Cooler Master RGB Tower cooler
Memory Crucial Pro 5600 32GB kit OCed to 6600
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500GB,,WD850N 2TB
Display(s) Samsung 28” 4K monitor
Case Corsair iCUE 4000D RGB AIRFLOW
Audio Device(s) EVGA NU Audio, Edifier Bookshelf Speakers R1280
Power Supply TT TOUGHPOWER GF A3 Gold 1050W
Mouse Logitech G502 Hero
Keyboard Logitech G G413 Silver
Software Windows 11 Professional v24H2
Untitled.png

Curious if there is any merit to this since I don't know where the information is from.
 
Joined
Jun 10, 2014
Messages
3,055 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Steve Burke of Gamers Nexus has another possibility according to the video below: some of the working ROPs were connected to TPCs where all of the CUDA cores are disabled, making those otherwise good ROPs unusable.
Yes, I know, I've seen it.
Any such combination of disabled hardware is already included in (1); the GPU is actually a lower bin (somehow).

Another guess of mine is that Nvidia's quality control did not adequately check ROPs before because earlier Blackwell GPUs were bound only for AI accelerators where ROPs are completely worthless before Nvidia started diverting some Blackwell GPUs to graphics duty. Quality control on ROPs is only needed for GPUs that are bound to perform graphics tasks. Nvidia could have botched the modifications to quality control needed to quality check GPUs that will be sold for graphics duty.
If this were the case, the ROPs would be detected and in use, just causing random crashes instead.
The actual problem is at the binning stage.

…like what happened with the GTX 970 where disabled ROPs cut the effective memory pool from 4GB to 3.5GB.
That is actually incorrect.
GTX 970 had one memory controller disabled, making the last 0.5 GB sharing bandwidth with another controller at a lower priority, so the last 0.5 GB was always there, just a bit slower.
 
Joined
Apr 16, 2022
Messages
63 (0.06/day)
Processor AMD Ryzen 9 7900X3D
Motherboard ASUS ROG Crosshair X670E Hero
Cooling ASUS ROG Strix LC III 360
Memory G.Skill 48GB(2x24) TZ5 Neo RGB EXPO 6400mhz CL32
Video Card(s) ASUS TUF RTX4070TI SUPER
Storage Adata XPG SX8200 Pro 2X2TB, Adata XPG SX8100 3X2TB,
Display(s) Dell 34" Curved Gaming Monitor S3422DWG
Case Corsair 5000X
Power Supply Corsair RM1000x SHIFT 80 PLUS Gold
Mouse Asus ROG Gladius II Core
Keyboard Asus ROG Strix Scope
And yet it didn't stop you from buying 4070 Ti Super.
I had to buy it because damn it AMD doesn't have CUDA.

Even though years have passed, I miss ATI's aggressive policy. I would like to go back to the X1950XTX I was using back then.
 
Joined
Feb 1, 2019
Messages
3,875 (1.74/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
I bet that was suppose to the 5090 originally and the 5090 we have now would be a Titan card.
I am on a similar line of thought, I think there is or was a planned SKU with these chips, and was a bin mix up at factory.
 
Joined
Dec 2, 2024
Messages
70 (0.74/day)
Location
Wet Coast of Canada Eh!
System Name NON-AMD Folding Farm
Processor Intel i7 7700K-9700K-14700K
Motherboard Asus WS Z390 Pro's, Z270 A, TUF Z390-Plus Gaming Wifi's, Prime Z790 P Wifi
Cooling Coolermater 212's, EVGA AIO's, EVGA CLx AIO
Memory Vengeance
Video Card(s) 4090FE, 4090 MSI Suprim AIO, MSI 4090 X Tri(will edit)'s, EVGA 3080 FTW3's
Storage WD HD's, WD NVME's
Display(s) You would ask...LoL 7 Colour Monitors all given to me for free
Case Rosewell Mining Cases, Coolermaster Haf Evo's
Power Supply EVGA 1600P2's, Seasonic Platinum 1300's, EVGA Platinum 750's, EVGA Gold 750's, EVGA Bronze 650's
Mouse Logitech's
Keyboard Logitech's, Old school ps2's, (will edit)
Software Win 10 Pro's, AI Suite 3, Precison X1's, Afterburner's, AVG Free, Bionic's, FAH's
Benchmark Scores HuH?
I had to buy it because damn it AMD doesn't have CUDA.

Even though years have passed, I miss ATI's aggressive policy. I would like to go back to the X1950XTX I was using back then.
ATI?....I have a 8500DV, with remote, in a P4B that will still do it's job if I turned it on...:)
 
Top