• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Fixing faulty Quadro K4000 memory

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Hello everyone.

Recently I acquired broken (yet cheap) NVIDIA Quadro K4000 3GB, with what looks like memory problems. My usecase is CUDA C++ development for fun and profit (in knowledge).

Symptoms:
Tested on Ubuntu 20.04 Server, the moment where OS enters userspace (login screen), GPU starts showing repetitive blocks of green and purple lines in boxes, identical to this image, but on larger scale. Machine works alright, I am able to login remotely using SSH, list PCI devices. "nvidia-smi" returns "No cards found", but I believe it's drivers, kernel or machine problem (as dmsg shows "NVRM: RmInitAdapter failed!"). I will test it on fresh OS install tomorrow. Artifacts show up on two other machines using Windows 7 and Windows 10 (both without drivers).

My ideas:
- BIOS modification to disable faulty memory bank

From my research, there should be possibility to disable faulty memory bank in BIOS rom, as shown in this video (RTX 2070 Super Repair (Faulty memory and error 43)). There are also threads (like this one (Where to edit fake GTX1050Ti BIOS to fix memory size?)) talking about fixing fake GPUs which show more memory than they have, would overwriting this value down trick GPU itself to not use banks in higher address space, and maybe fix the problem?

- Replacing broken memory ICs with a twist

My Quadro uses Samsung K4G20325FD-FC03, however only compatible ICs I could find locally are Samsung K4G20325FD-FC04, where this datasheet shows that they have 0.07ns slower "speed". I guess I would have to clock down memory in BIOS to make it work, if it even would work?

Conclusion and questions:
I see two ways to make this card work again, where in the latter one I could damage the card irreparably. I'm moderately confident with my hot-air station skills, however I only used it to solder SOP packages (mainly flash/eeproms).
Is it possible to modify the BIOS for this card to not use specified memory bank? If so, how? (more on second question later)
Is it possible to modify the BIOS to lower shown memory and trick the GPU to not use banks in higher address space?
Would -FC04 ICs work with this card given lower memory clock?

Additional information:
I've spent quiet some time understanding NVIDIAs BIT structure from this document, however, pointers do not make any sense. Places where they point, look random, and I could not find information about structures they point to. I also tried to understand the changes made to BIOS roms made in said threads about fake GPUs, with no luck. I saw that most of the memory related stuff is around 0x7dXX address in these examples. I would love to learn more about NVIDIAs BIOS structure, but there is no information I could find that shows how exactly things are structured. I already know I'll have to check which memory bank exactly is broken, using MATS.

I have CH341A programmer on hand, so experimenting with different roms is not a problem. I also attach BIOS rom I dumped from this card.

If You need some more information, please say. I'll provide what I can.
 

Attachments

  • k4000.rom
    256 KB · Views: 72

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,096 (6.73/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Go to krisfix.de for pcb repairs
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Go to krisfix.de for pcb repairs
Looking at prices + shipping, it's absolutely not worth it. I paid 25 eur for this card. National shipping would cost probably 20 eur each way + at least 125 eur for fix. I'll either fix it or break it myself.

If I could pay this 165 EUR for in-depth information about NVIDIA BIOS structure, I definitely would. But just to fix this card, definitely no.
 
Last edited:

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,096 (6.73/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Looking at prices + shipping, it's absolutely not worth it. I paid 25 eur for this card. National shipping would cost probably 20 eur each way + at least 125 eur for fix. I'll either fix it or break it myself.
Go there for guidance
 
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
Last edited:

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
I've just ran nvidia MATS utility, and I'm quiet shocked if I understand the report correctly. Three memory banks have read errors? I'll reflow the GPU and RAM today, as proposed by Shrek, to check if the problem is really in solder, and not in the ICs themselves. I attach the mats reports for 5MB and 10MB tests.

Go there for guidance
I've spent some time on kris's website, but I could not find any practical information on DIY GPU repairs themselves, other than his Youtube videos, which I'm already familiar with. Could You elaborate on what should I look for on his website?
 

Attachments

  • report-5mb.txt
    99.1 KB · Views: 93
  • report-10mb.txt
    99.1 KB · Views: 113
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
Last edited:

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,096 (6.73/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
I've just ran nvidia MATS utility, and I'm quiet shocked if I understand the report correctly. Three memory banks have read errors? I'll reflow the GPU and RAM today, as proposed by Shrek, to check if the problem is really in solder, and not in the ICs themselves. I attach the mats reports for 5MB and 10MB tests.


I've spent some time on kris's website, but I could not find any practical information on DIY GPU repairs themselves, other than his Youtube videos, which I'm already familiar with. Could You elaborate on what should I look for on his website?
Email him
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
I did the reflow on GPU itself, first with low-heat mode (300C) to preheat the PCB and let the flux flow under the GPU, then blasted the GPU with high-heat mode (500C) until it wiggled when I tapped it with tweezers. GPU still shows artifacts, however, report now shows WRITE errors, instead of READ errors, and as far as I can see, there are much less errors. I'll try to reflow the memory now. Should I try to reflow the GPU once again, but let it cool off slowly by applying low-heat over 5-10cm distance?
 

Attachments

  • report-5mb-reflow.txt
    79.1 KB · Views: 47
  • report-5mb-reflow.txt
    79.1 KB · Views: 113
  • IMG20220813194447.jpg
    IMG20220813194447.jpg
    2.4 MB · Views: 221
  • IMG20220813194923.jpg
    IMG20220813194923.jpg
    1.2 MB · Views: 141
  • IMG20220813194939.jpg
    IMG20220813194939.jpg
    2.5 MB · Views: 249
  • IMG20220813203531.jpg
    IMG20220813203531.jpg
    1.9 MB · Views: 145
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
I would not have wiggled it myself for fear of cross joints.
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Just a little wiggle. Before this quadro, I tried it on working NVS 310 to practice a bit, and did much bigger wiggle. It still works. I wanted to be completely sure that the solder has liquified. If it dies, it dies and I learned new skill, if it works, great, and I learned new skill too.

Okay, very weird thing happened. I reflowed what I thought is bank A, just to check if it will fix the problem in MATS, however, MATS now shows that ONLY bank A has write errors. Are these banks labeled on each nvidia GPU same? When I say "bank A", I think of first two memory ICs counting anticlockwise while looking at GPU IC, like in this picture. I'll go on to reflow each one of these memory ICs.

I'm done for today. Heated one of the ICs too much, it flew off the board. Two new -FC04 ICs should arrive on wednesday. Until then, I'll try to reball this IC by hand (that will be fun...), or just run the test without this single IC and check if bank A is still throwing errors.

Day 3: Unfortunately I can't find the IC that flew off, but MATS now shows only that one half of bank A is broken. Ignore the bank C, it's the IC that's not on the board right now. Can someone please help me identify which exactly IC should I replace in bank A? Quadro K4000 has 6 memory IC's per PCB side, 12 total. Looking at GPU side, 3 are on top, 2 on right, 1 on left.

I have a theory. I removed the memory IC to which the arrow points to, and MATS throws errors at bank C, but thanks to that, I think I found the layout of memory ICs in comparison to MATS report. If everything scales correctly, and works the same on other side of PCB, I'll replace the broken memory IC in bank A with first try. As I said, new ICs should arrive on wednesday, so I'll keep you updated.

HP-nVidia-QUADRO-K4000-3-GB-GDDR5-PN-713381-001-Producent-HP.jpg
 

Attachments

  • report-memory-reflow.txt
    47 KB · Views: 76
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
Last edited:

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
What are you using to reflow?
For GPU itself I used the heatgun, it's Bosch EasyHeat 500. For memory I use my hot-air station, WER 853D, as they don't need as much heat as the GPU. I was pretty tired when this accident happened. Not to worry, I cleaned the pads, everything is okay, I'll replace it as soon as new parts arrive.
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Little update on my situation:

Parts arrived, but I don't really have time on weekdays right now. Also my memory layout theory was wrong, and I'll try to document the proper layout on weekend.

Third thing which is kinda funny, I bought broken Quadro K5000 4GB from same seller, again, for 25 EUR, and the fix was literally to replace broken 0.05 ohm resistors on 12V rail with 1cm long copper wire, which took me around 3 hours to locate the problem, with many more ideas on what could be wrong. I'm thinking about documenting this journey in new post if anyone is interested.
 
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
I'm interested
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Sorry for no updates, life has many surprises. I don't think I will continue to fix this K4000, as it took way longer than I intended. The failing bits addressation in comparison to memory IC placement is pretty weird, and if anyone would need the exact placement for each bank based on failing bits output from NVIDIA MATS, I could desolder the ICs one by one and check which one is which when I'll have some free time.

For now, the K4000 will be a showpiece on my shelf, or a parts donor if I would buy a new, broken GPU.

Thank you all for help with this one, I'm pretty sure I'll come back with some new GPU to fix later.

As for the K5000: turned out that 0 ohm resistors that connect 6 pin 12V line with mosfets on VGPU line went bad, and dropped the 12V line down to 2V. Replacing them with a bit of copper wire fixed the problem, and this K5000 works to this day. The lesson is: sometimes there are no shorts, and resistors like to play hide and seek with you. Follow voltage lines with multimeter to check where voltage drops.
 
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
Were they current measuring resistors?
 

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
Were they current measuring resistors?
1661894102142.png


No, they were in 12V line, where there either could be a coil, or resistors, as seen on this circuit as R101-103 and R709. The current measuring resistor (RS1) was waaaaaay back on other side of the PCB near 6 pin connector. Not really sure what's the point for a coil here, but by default there were resistors in place. 12V_F line then goes directly to switching mosfets for voltage converter.
 
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home

bingobangobongo

New Member
Joined
Aug 12, 2022
Messages
10 (0.01/day)
I think that in this case they were just used as a jumper, since four of them in parallel could handle some current. Although if the power supply in last owner's PC had a voltage spike, maybe they acted as fuses unintentionally? I'm researching inductors in high power voltage lines right now, and best I could find is Loading coil to minimize the voltage instability (due to low quality power supply for example). As Quadro K5000 was created as workstation/server GPU, I believe they replaced the coil with resistors to cut costs, as workstation/server power supply should be pretty high quality, with nice and stable voltage lines, which wouldn't be the case in consumer grade PC's (as K5000 and GTX 760 share the same PCB, but the latter has less VRAM and is clocked higher). But that's just my theory, could be wrong.

Also, found a nice GTX 760 PCB photo which is pretty much the same for K5000, just different VRM coils and more memory. You can see the resistors we're talking about to the left.

1661895932567.png
 
Joined
Mar 21, 2021
Messages
5,217 (3.74/day)
Location
Colorado, U.S.A.
System Name CyberPowerPC ET8070
Processor Intel Core i5-10400F
Motherboard Gigabyte B460M DS3H AC-Y1
Memory 2 x Crucial Ballistix 8GB DDR4-3000
Video Card(s) MSI Nvidia GeForce GTX 1660 Super
Storage Boot: Intel OPTANE SSD P1600X Series 118GB M.2 PCIE
Display(s) Dell P2416D (2560 x 1440)
Power Supply EVGA 500W1 (modified to have two bridge rectifiers)
Software Windows 11 Home
Looks like a replacement for an inductor
board.jpg
 
Top