• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

An interesting AMD GPU issue.

Status
Not open for further replies.

Atlas39

New Member
Joined
Apr 10, 2024
Messages
10 (0.03/day)
@Atlas39
are the top fans set to blow in, or out?
would have them go out for a try just to have it done.

maybe look at getting even a small UPS, preferred one with AVR, so it can smooth out any fluctuation you have on incoming power,
usually will also help (psu) lifetime a bit.

are you on the latest bios?
would be worth a try.
do not use ony perf tweaks/oc on the bios settings until we know what causes the issue.
on ryzen, settings for ram can be set manually (clocks/voltage), and better than using some preset.

we need some location info, if you want proper recommendations it can help point to places where to get parts,
or at least be able to compare pricing.

i wouldnt get all new stuff all at once, switch out parts and by chance you might find the (hw) cause,
then you could build a new if you like, and redo the "old" one fore selling or as second pc for you home/family.

start with the ram, your board can do 3600, this would allow for the IF to run at optimal speed ad get you a little boost.

can you try the gigabyte/deepcool psu in a different pc, and see if it works
(leave it outside the case, just connect all cables for a day or two of testing)

next i would look for a different card from a friend to test, just to rule it out.

Front fans blowing inside of the case top fans and exhaust fan blowing out

yes an ups might be help but i have no chance to test with an ups now

i am on the latest bios updated by asus ez flash utility

overclock is very interesting in my case
when i use stock clocks for cpu (3.6 ghz )
i have more frequent crashes but when i overclock to 4.4 ghz and enable pbo & cpb crashes less than stock

same with gpu when i overclock vram 2300-2400 mhz i have less crash ( stock is 2000 )

and when it crash one time it will keep crash every game until i ddu drivers and install back
so then i am fine some days but its starts again

sometimes i can do occt power test even 2 hours but sometimes within a minute it crash

i tested both psu on my old pc it worked well
and also i tested my old gpu ( 1050 ti )
there is no crashes again

there is a stability problem that i cannot find whatever i do its temproary fix


edit: my location is Turkiye but soon i will move to Philippines for minimum 5 years depends on my business
and i noticed most of dx11 games crashing but i never experienced a crash with vulkan
 

Toothless

Tech, Games, and TPU!
Supporter
Joined
Mar 26, 2014
Messages
9,792 (2.44/day)
Location
Washington, USA
System Name Veral
Processor 7800x3D
Motherboard x670e Asus Crosshair Hero
Cooling Thermalright Phantom Spirit 120 EVO
Memory 2x24 Klevv Cras V RGB
Video Card(s) Powercolor 7900XTX Red Devil
Storage Crucial P5 Plus 1TB, Samsung 980 1TB, Teamgroup MP34 4TB
Display(s) Acer Nitro XZ342CK Pbmiiphx, 2x AOC 2425W, AOC I1601FWUX
Case Fractal Design Meshify Lite 2
Audio Device(s) Blue Yeti + SteelSeries Arctis 5 / Samsung HW-T550
Power Supply Corsair HX850
Mouse Corsair Harpoon
Keyboard Corsair K55
VR HMD HP Reverb G2
Software Windows 11 Professional
Benchmark Scores PEBCAK
Front fans blowing inside of the case top fans and exhaust fan blowing out

yes an ups might be help but i have no chance to test with an ups now

i am on the latest bios updated by asus ez flash utility

overclock is very interesting in my case
when i use stock clocks for cpu (3.6 ghz )
i have more frequent crashes but when i overclock to 4.4 ghz and enable pbo & cpb crashes less than stock

same with gpu when i overclock vram 2300-2400 mhz i have less crash ( stock is 2000 )

and when it crash one time it will keep crash every game until i ddu drivers and install back
so then i am fine some days but its starts again

sometimes i can do occt power test even 2 hours but sometimes within a minute it crash

i tested both psu on my old pc it worked well
and also i tested my old gpu ( 1050 ti )
there is no crashes again

there is a stability problem that i cannot find whatever i do its temproary fix


edit: my location is Turkiye but soon i will move to Philippines for minimum 5 years depends on my business
and i noticed most of dx11 games crashing but i never experienced a crash with vulkan
Are you able to secure a good power supply, like an RM750x or the like? Maybe a shop nearby can test or a friend? We gotta rule out that because what you tested with aren't known for being good. Yeah it's a 6600XT and it's not power hungry but it's possible it'll trip OCP with it being RDNA2.
 

Atlas39

New Member
Joined
Apr 10, 2024
Messages
10 (0.03/day)
Are you able to secure a good power supply, like an RM750x or the like? Maybe a shop nearby can test or a friend? We gotta rule out that because what you tested with aren't known for being good. Yeah it's a 6600XT and it's not power hungry but it's possible it'll trip OCP with it being RDNA2.

Yes today i tested with corsair rm850 80+ gold and its crashed again so my main suspect is gpu
 
Joined
Nov 7, 2017
Messages
2,148 (0.80/day)
Location
Ibiza, Spain.
System Name Main
Processor R7 5950x
Motherboard MSI x570S Unify-X Max
Cooling converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory 16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s) GB 2080S WaterForce WB
Storage six M.2 pcie gen 4
Display(s) Sony 50X90J
Case Tt Level 20 HT
Audio Device(s) Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply Corsair RMx 750w
Mouse Logitech G903
Keyboard GSKILL Ripjaws
VR HMD NA
Software win 10 pro x64
Benchmark Scores TimeSpy score Fire Strike Ultra SuperPosition CB20
@Atlas39
for now, leave oc off, it just introduces another variable, making troubleshooting harder.
disable rebar in bios, if its on, switch psu setting to "typical current" (bios).
disable onboard gpu if its not already.

given what you have done, its probably psu and/or gpu.
start with the psu, maybe buy one, keep the others for rebuilding.
if you really invest in new parts, start with proper stuff..

RM750x
only did a quick look, but other were not same quality or cost more
 
Last edited:
Joined
Apr 18, 2019
Messages
2,686 (1.24/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Cleanly uninstall and reinstall your drivers.
Set all hardware to 'bone stock', cleanly reinstall Windows.
(Perhaps, try a Chipset Driver rollback, first? Seen a few folks having 'odd' issues w/ the latest release.)

I'm experiencing a black screen followed by a restart issue while playing games and conducting GPU stress tests.
The issue doesn't occur every time, and it's unpredictable; it happens randomly.

For example, when playing Cyberpunk 2077 with the CPU at stock settings, specifically at 3600 MHz, I encounter a black screen and restart issue. However, when I enable CPU Core Performance Boost, I can play for hours without any problems.

Sometimes when I overclock the CPU, the problem is temporarily resolved. And sometimes, when I overclock the VRAM, the problem is also temporarily resolved, but it starts again after a while.
In some cases, when I set the GPU fan speed to 100%, the problem is temporarily resolved, but then it starts again later.
I was having nearly identical issues but,
did not investigate CPU/RAM/IF; I'd already 'danced' with Power issues*, it didn't feel the same.
*loose-worn 8-pins.

In my case(s):
1. On my previous (dirty) Windows 11 Install, I could not get AFMF working, and was getting hard reboots occasionally.
2. After formatting and installing Win10EntIotLTSC2021, I was getting the hard reboots from even a mild overclock *or* undervolt on my 7900 GRE. (Primarily in RT-/load-heavy titles like CP'77)

Cleanly uninstalling the AMD Adrenalin 24.3.1 (WHQL Recommended) drivers using DDU in Safe Mode, rebooting, and installing 'freshly downloaded' drivers, fixed the issue for me.
Happily on the R.ID/AmernimeZone driver, @TM.
However, this was after a clean OS install 'took care of' the other issues.

AMD 1st Party: https://drivers.amd.com/drivers/whq...lin-edition-24.3.1-win10-win11-mar20-rdna.exe
R.ID(AmernimeZone): https://sourceforge.net/projects/ra...n10-Win11-PolarisVegaNavi-Nebula.exe/download
 
Joined
Nov 7, 2017
Messages
2,148 (0.80/day)
Location
Ibiza, Spain.
System Name Main
Processor R7 5950x
Motherboard MSI x570S Unify-X Max
Cooling converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory 16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s) GB 2080S WaterForce WB
Storage six M.2 pcie gen 4
Display(s) Sony 50X90J
Case Tt Level 20 HT
Audio Device(s) Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply Corsair RMx 750w
Mouse Logitech G903
Keyboard GSKILL Ripjaws
VR HMD NA
Software win 10 pro x64
Benchmark Scores TimeSpy score Fire Strike Ultra SuperPosition CB20
@LabRat 891
amd chipset drivers can have issues with rollback.
uninstall and cleaning is better, but was already done, so..
 
Joined
Apr 18, 2019
Messages
2,686 (1.24/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Joined
Nov 7, 2017
Messages
2,148 (0.80/day)
Location
Ibiza, Spain.
System Name Main
Processor R7 5950x
Motherboard MSI x570S Unify-X Max
Cooling converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory 16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s) GB 2080S WaterForce WB
Storage six M.2 pcie gen 4
Display(s) Sony 50X90J
Case Tt Level 20 HT
Audio Device(s) Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply Corsair RMx 750w
Mouse Logitech G903
Keyboard GSKILL Ripjaws
VR HMD NA
Software win 10 pro x64
Benchmark Scores TimeSpy score Fire Strike Ultra SuperPosition CB20
all good, happens to the best.
mumbles: damn a$$ not reading whole post .. :D
 
Last edited:

3x0

Joined
Oct 6, 2022
Messages
1,010 (1.12/day)
Processor AMD Ryzen 7 5800X3D
Motherboard MSI MPG B550I Gaming Edge Wi-Fi ITX
Cooling Scythe Fuma 2 rev. B Noctua NF-A12x25 Edition
Memory 2x16GiB G.Skill TridentZ DDR4 3200Mb/s CL14 F4-3200C14D-32GTZKW
Video Card(s) PowerColor Radeon RX7800 XT Hellhound 16GiB Noctua NF-A12x25 Edition
Storage Western Digital Black SN850 WDS100T1X0E-00AFY0 1TiB, Western Digital Blue 3D WDS200T2B0A 2TiB
Display(s) Dell G2724D 27" IPS 1440P 165Hz, ASUS VG259QM 25” IPS 1080P 240Hz
Case Cooler Master NR200P ITX
Audio Device(s) Altec Lansing 220, HyperX Cloud II
Power Supply Corsair SF750 Platinum 750W SFX
Mouse Endgame Gear OP1 8K
Keyboard HyperX Alloy Origins Aqua
Joined
Jan 1, 2012
Messages
438 (0.09/day)
Hey! This sounds like the issue I had (and still have?), almost to the detail. Welcome to the nightmare. It's not going to be fun.

I'll share my experiences in case this advises you in any way.

The PC would go to a Black screen under certain GPU loads, and after some amount of time of being on this Black screen, any sound playing will distort and then stop, and then the PC will restart.

During this entire process, the power never cuts out. It never shuts off. It just loses video signal, and then eventually restarts.

This doesn't usually happen when a load is first being put on the GPU (like when a PSU can't meet a sudden power spike). It almost always happens randomly when the PSU was already under some load state.

There would be no BSODs. But there will be an Event ID 18 (on AMD CPU systems at least since I think that Event is exclusive to them?) with a "cache hierarchy" error type. Sometimes a WHEA log. Sometimes a Watch Dog log. The Event ID 18 just says there was a machine check exception (which likely explains the restart, so the CPU is signaling it?), the WHEA is a generic 0x24 error, and the Watch Dog will usually point to the GPU drivers. Specifically, VIDEO_TDR_TIMEOUT_DETECTED (117), VIDEO_ENGINE_TIMEOUT_DETECTED (141), VIDEO_MINIPORT_BLACK_SCREEN_LIVEDUMP (1b8), and VIDEO_DXGKRNL_BLACK_SCREEN_LIVEDUMP (1a8) were the ones I'd see for that.

Seems the GPU is getting unstable somehow, the CPU catches it and decides it's a machine check exception condition, can't recover, and signals a restarts.

I also noticed the issue is worse with RAM at stock, and more stable (but still very unstable) with the RAM profile active. For example, League of Legends was one of the light enough games to never crash for me. But then if I disable the RAM profile, it starts crashing as well.

It's a nightmare to resolve; let me tell you that now. I went down a path over two months trying a lot of things, like BIOS updates, different drivers, a ton of random certain settings in both the BIOS and OS (like PCI Express power saving things, ULPS, etc.), certain voltages for the CPU/Infinity Fabric system, a total reinstall of Windows. I swapped CPU and it happened on a 3700X and a 5800X3D. So, not CPU itself. I tried with XMP on and off, as well as removing half the DIMMs. Completely tore my system apart and cleaned and disconnected and reconnected everything, multiple times, and even tried different SATA/PSU cables where I could. Multiple DP/HDMI cables tried. Disconnected fans/storage I didn't need and left the case open. Lots and lots of really minor and niche things I won't even mention but just to say I've got to comical lengths to rule things out. Trying a different motherboard, RAM, and PSU was just about the only things I didn't try. All that stuff was stable with the prior video card, but it's a low power draw video card (GTX 1060 to 7800 XT). The system passes all CPU and RAM stress tests (Prime 95, Memtest86, and all OCCT CPU and RAM tests), but sometimes fails GPU ones (Furmark on its own is usually fine, but the OCCT "GPU variable" in particular, and Furmark plus a browser seem to trip it, so these sorts of scenarios mimick whatever games are doing to cause it). Add to this that a seemingly endless amount of people complaining of Black screen (or Green on HDMI) to restarts on the 7800 XT and I think I had tried enough to rule other things out and figured it was time to suspect the first thing I should have tried all along.

Ended up doing an RMA on the video card and it mostly resolved the issues to begin with (while 23.12.1 and newer drivers introduce more of their own...), but I've had it happen now once on the new one too (started with a new use case so it's possible this one never resolved it so much as made it much more rare?). This time, the game was flickering White for a single frame a few times before it happened. Previously it'd just happen with no symptom beforehand. I'm tempted to try a new PSU if it starts happening more even though I don't think it's a PSU issue (largely because changing the GPU is what changes how bad it is), but I don't know. There's too many "mixed" symtoms which make narrowing this down infuriating. Changing trhe GPU changes the severity, but yet changing RAM profile speeds also does?

So based on what I tried and what you symptoms are (same as mine!), I'd say changing the video card itself is mostly likely to bring change, but it's possible it won't eliminate the issue. PSU would be the next thing to change, but... you changed three. And everyone else I saw with this issue (at least on the 7800 XT) never had success changing the PSU. If it's actually power related, it's either on the GPU itself, or maybe with our homes/outlets.

I don't know if post-pandemic stuff just has poor quality control, like maybe GPUs since then are using poor power regulatory stuff since the issue seems findable with both AMD's and nVidia's last two generations (but more so AMD).

Whole experience has been super frustrating. For the part that has ballooned in price most of all, it's infuriating to be having such issues.

In any case, good luck. Like you, I just got tired of it. Spending months with nonstop issues and even losing data due to corruption over it breaks you.
 
Last edited:
Joined
Nov 7, 2017
Messages
2,148 (0.80/day)
Location
Ibiza, Spain.
System Name Main
Processor R7 5950x
Motherboard MSI x570S Unify-X Max
Cooling converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory 16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s) GB 2080S WaterForce WB
Storage six M.2 pcie gen 4
Display(s) Sony 50X90J
Case Tt Level 20 HT
Audio Device(s) Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply Corsair RMx 750w
Mouse Logitech G903
Keyboard GSKILL Ripjaws
VR HMD NA
Software win 10 pro x64
Benchmark Scores TimeSpy score Fire Strike Ultra SuperPosition CB20
Joined
Apr 18, 2019
Messages
2,686 (1.24/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Hey! This sounds like the issue I had (and still have?), almost to the detail. Welcome to the nightmare. It's not going to be fun.

I'll share my experiences in case this advises you in any way.

The PC would go to a Black screen under certain GPU loads, and after some amount of time of being on this Black screen, any sound playing will distort and then stop, and then the PC will restart.

During this entire process, the power never cuts out. It never shuts off. It just loses video signal, and then eventually restarts.

This doesn't usually happen when a load is first being put on the GPU (like when a PSU can't meet a sudden power spike). It almost always happens randomly when the PSU was already under some load state.

There would be no BSODs. But there will be an Event ID 18 (on AMD CPU systems at least since I think that Event is exclusive to them?) with a "cache hierarchy" error type. Sometimes a WHEA log. Sometimes a Watch Dog log. The Event ID 18 just says there was a machine check exception (which likely explains the restart, so the CPU is signaling it?), the WHEA is a generic 0x24 error, and the Watch Dog will usually point to the GPU drivers. Specifically, VIDEO_TDR_TIMEOUT_DETECTED (117), VIDEO_ENGINE_TIMEOUT_DETECTED (141), VIDEO_MINIPORT_BLACK_SCREEN_LIVEDUMP (1b8), and VIDEO_DXGKRNL_BLACK_SCREEN_LIVEDUMP (1a8) were the ones I'd see for that.

Seems the GPU is getting unstable somehow, the CPU catches it and decides it's a machine check exception condition, can't recover, and signals a restarts.

I also noticed the issue is worse with RAM at stock, and more stable (but still very unstable) with the RAM profile active. For example, League of Legends was one of the light enough games to never crash for me. But then if I disable the RAM profile, it starts crashing as well.

It's a nightmare to resolve; let me tell you that now. I went down a path over two months trying a lot of things, like BIOS updates, different drivers, a ton of random certain settings in both the BIOS and OS (like PCI Express power saving things, ULPS, etc.), certain voltages for the CPU/Infinity Fabric system, a total reinstall of Windows. I swapped CPU and it happened on a 3700X and a 5800X3D. So, not CPU itself. I tried with XMP on and off, as well as removing half the DIMMs. Completely tore my system apart and cleaned and disconnected and reconnected everything, multiple times, and even tried different SATA/PSU cables where I could. Multiple DP/HDMI cables tried. Disconnected fans/storage I didn't need and left the case open. Lots and lots of really minor and niche things I won't even mention but just to say I've got to comical lengths to rule things out. Trying a different motherboard, RAM, and PSU was just about the only things I didn't try. All that stuff was stable with the prior video card, but it's a low power draw video card (GTX 1060 to 7800 XT). The system passes all CPU and RAM stress tests (Prime 95, Memtest86, and all OCCT CPU and RAM tests), but sometimes fails GPU ones (Furmark on its own is usually fine, but the OCCT "GPU variable" in particular, and Furmark plus a browser seem to trip it, so these sorts of scenarios mimick whatever games are doing to cause it). Add to this that a seemingly endless amount of people complaining of Black screen (or Green on HDMI) to restarts on the 7800 XT and I think I had tried enough to rule other things out and figured it was time to suspect the first thing I should have tried all along.

Ended up doing an RMA on the video card and it mostly resolved the issues to begin with (while 23.12.1 and newer drivers introduce more of their own...), but I've had it happen now once on the new one too (started with a new use case so it's possible this one never resolved it so much as made it much more rare?). This time, the game was flickering White for a single frame a few times before it happened. Previously it'd just happen with no symptom beforehand. I'm tempted to try a new PSU if it starts happening more even though I don't think it's a PSU issue (largely because changing the GPU is what changes how bad it is), but I don't know. There's two many "mixed" symtoms. Changing trhe GPU changes the severity, but yet changing RAM profile speeds also does?

So based on what I tried and what you symptoms are (same as mine!), I'd say changing the video card itself is mostly likely to bring change, but it's possible it won't eliminate the issue. PSU would be the next thing to change, but... you changed three. And everyone else I saw with this issue (at least on the 7800 XT) never had success changing the PSU. If it's actually power related, it's either on the GPU itself, or maybe with our homes/outlets.

I don't know if post-pandemic stuff just has poor quality control, like maybe GPUs since then are using poor power regulatory stuff since the issue seems findable with both AMD's and nVidia's last two generations (but more so AMD).

Whole experience has been super frustrating. For the part that has ballooned in price most of all, it's infuriating to be having such issues.

In any case, good luck. Like you, I just got tired of it. Spending months with nonstop issues and even losing data due to corruption over it breaks you.
Your case almost sounds like a PCIe 'issue'. Like, a bad re-driver or termination, or something (dust-corrosion?).

Does forcing the PCIe Link to Gen3 or Gen2 'effect' the issue at all? (worth trying on OP's issue too, I suppose)

PCIe is all through the SoC (CPU) now. Even the 'chipset lanes' are downstream and connected to the CPU via PCIe link.
So, if the PCIe 'bus' (it's both not-a-bus, and a bus, at the same time) encounters errors, it makes sense the CPU would appear the source.

I'm wondering if 3DM's PCIe bandwidth test would cause the issue to 'show'?
(Once had a bad x16 riser that'd only handshake x2-x4 lanes, *crash* on that test in an old PCIe 1.1 dual-K8 build. Removing the riser, resolved that issue)
 
Joined
Nov 16, 2020
Messages
110 (0.07/day)
System Name Pre-10 year plan
Processor FX 8350
Motherboard Asus M5A97 v2
Cooling Cooler Master Hyper 410
Memory 4 x Hyper X 4gb DDR3 1600
Video Card(s) Sapphire R9 390 8gb
Storage 128gb SSD boot drive 512gb SSD
Display(s) 1 x Benq 27" 4k 60Hz, 1 x AoC 27" 1080 165Hz
Case Thermaltake Level 10 Combat edition
Audio Device(s) On board
Power Supply AX1200i
Mouse Logitech G502 Hero
Keyboard Logitech G512
Software Windows 10
Kernel power 41 (63) Is I believe a potential PSU fault.
 
Joined
Jan 8, 2017
Messages
9,769 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
My 7900XTX was tripping my EVGA 1300w. OCP will trip if the spike is hard enough and the unit can't handle it. It's about QUALITY of the unit which is none of what OP was testing with.
The guy is not talking about OCP, he is talking about ripple, something that I don't even know how you could positively test for, this is not even a 300W system, you think it has 650W+ spikes ?

Also, a shitty PSU will have a shitty OCP circuit that doesn't actually trip when it needs to, not the other way around. Never heard of a PSU that can't even handle half of the rated power without tripping OCP, let alone with three different PSUs tested.

EDIT :
Yes today i tested with corsair rm850 80+ gold and its crashed again so my main suspect is gpu
Ah, forth bad PSU, right guys ? Crazy odds, world's unluckiest man ?

Kernel power 41 (63) Is I believe a potential PSU fault.
kernel power 41 could be anything, it's not related to any source in particular.

If it was PSU related the system wouldn't even have time to log an event. A PSU is not gonna cause a system to black screen and reboot, PSU issue means hard shutdown.
 
Last edited:
Joined
Nov 16, 2020
Messages
110 (0.07/day)
System Name Pre-10 year plan
Processor FX 8350
Motherboard Asus M5A97 v2
Cooling Cooler Master Hyper 410
Memory 4 x Hyper X 4gb DDR3 1600
Video Card(s) Sapphire R9 390 8gb
Storage 128gb SSD boot drive 512gb SSD
Display(s) 1 x Benq 27" 4k 60Hz, 1 x AoC 27" 1080 165Hz
Case Thermaltake Level 10 Combat edition
Audio Device(s) On board
Power Supply AX1200i
Mouse Logitech G502 Hero
Keyboard Logitech G512
Software Windows 10
The guy is not talking about OCP, he is talking about ripple, something that I don't even know how you could positively test for, this is not even a 300W system, you think it has 650W+ spikes ?

Also, a shitty PSU will have a shitty OCP circuit that doesn't actually trip when it needs to, not the other way around. Never heard of a PSU that can't even handle half of the rated power without tripping OCP, let alone with three different PSUs tested.

EDIT :

Ah, forth bad PSU, right guys ? Crazy odds, world's unluckiest man ?


kernel power 41 could be anything, it's not related to any source in particular.

If it was PSU related the system wouldn't even have time to log an event. A PSU is not gonna cause a system to black screen and reboot, PSU issue means hard shutdown.
I can only speak as I find. I’ve had black screen reboots on a 775 build that have kernel 41 (63) error which were rectified by replacing the psu. I also tested that psu (hipper 450w) on another pc- AM3 in this case and after a few days the same failure occurred.
Initially I chased my tail replacing everything, reinstalling windows even tried an older version of windows.
For me it came down to the psu hence my comment.
 
Joined
Jan 8, 2017
Messages
9,769 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I can only speak as I find. I’ve had black screen reboots on a 775 build that have kernel 41 (63) error
Yeah but I am saying this does not indicate anything in particular, it's just a generic code for shutdowns/reboots, if you hold down the power button till the PC shuts off you're gonna get a kernel power 41 for example.
 
Joined
Feb 24, 2023
Messages
3,774 (4.97/day)
Location
Russian Wild West
System Name D.L.S.S. (Die Lekker Spoed Situasie)
Processor i5-12400F
Motherboard Gigabyte B760M DS3H
Cooling Laminar RM1
Memory 32 GB DDR4-3200
Video Card(s) RX 6700 XT (vandalised)
Storage Yes.
Display(s) MSi G2712
Case Matrexx 55 (slightly vandalised)
Audio Device(s) Yes.
Power Supply Thermaltake 1000 W
Mouse Don't disturb, cheese eating in progress...
Keyboard Makes some noise. Probably onto something.
VR HMD I live in real reality and don't need a virtual one.
Software Windows 11 / 10 / 8
Benchmark Scores My PC can run Crysis. Do I really need more than that?
The OP tested five PSUs, two of them being explicit overkills, and all of them had the same issue with this GPU. However, no issues with a GTX 1050 Ti.

It's a faulty GPU. Test a different RDNA2 GPU, preferrably something with higher wattage to rule out the PSU once and for all, and if you run into no issue then it's time to RMA/fix/sell for scraps your GPU and get a properly working one.
 

Atlas39

New Member
Joined
Apr 10, 2024
Messages
10 (0.03/day)
The OP tested five PSUs, two of them being explicit overkills, and all of them had the same issue with this GPU. However, no issues with a GTX 1050 Ti.

It's a faulty GPU. Test a different RDNA2 GPU, preferrably something with higher wattage to rule out the PSU once and for all, and if you run into no issue then it's time to RMA/fix/sell for scraps your GPU and get a properly working one.

Tomorrow I will borrow an rx 6600 xt from my friend and try again.

However, since the restart problem is untimely and unpredictable, even if it works tomorrow, it may break again the next day. For now, upon @Taras K's advice, I manually allocate virtual memory from Windows settings and the computer has not restarted at all. I will try more to be completely sure, I will do a power test for at least 1 hour and post the results again.
 

Atlas39

New Member
Joined
Apr 10, 2024
Messages
10 (0.03/day)
Update:

i found out pci-e slot is problematic or gpu itself PCI-E slot is too wide or GPU pins are too thin

pci e slot too loose that gpu can tilt up and down inside

so i use little pressure on gpu and lean it to cpu side I fixed it that way with the help of screws

Looks like the problem is solved!

Probably the GPU pins and PCI-E slot pins were not making solid contact with each other. and this caused a momentary power loss and the computer to restart.

However, I still need to playtest like this for at least a few days.

Those who experience a similar problem should try this method before changing the GPU or motherboard.

this problem can be solved also using a gpu holder like this:

images-93.jpeg
 
Joined
Apr 18, 2019
Messages
2,686 (1.24/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Update:

i found out pci-e slot is problematic or gpu itself PCI-E slot is too wide or GPU pins are too thin

pci e slot too loose that gpu can tilt up and down inside

so i use little pressure on gpu and lean it to cpu side I fixed it that way with the help of screws

Looks like the problem is solved!

Probably the GPU pins and PCI-E slot pins were not making solid contact with each other. and this caused a momentary power loss and the computer to restart.

However, I still need to playtest like this for at least a few days.

Those who experience a similar problem should try this method before changing the GPU or motherboard.

this problem can be solved also using a gpu holder like this:

View attachment 342976
Plastic gets soft when it's warm; even a reinforced slot will 'flex' more when warm...
This makes a LOT of sense.

I'll be looking into getting/making one, myself. Thanks!
 
Joined
Jan 1, 2012
Messages
438 (0.09/day)
Your case almost sounds like a PCIe 'issue'. Like, a bad re-driver or termination, or something (dust-corrosion?).

Does forcing the PCIe Link to Gen3 or Gen2 'effect' the issue at all? (worth trying on OP's issue too, I suppose)

PCIe is all through the SoC (CPU) now. Even the 'chipset lanes' are downstream and connected to the CPU via PCIe link.
So, if the PCIe 'bus' (it's both not-a-bus, and a bus, at the same time) encounters errors, it makes sense the CPU would appear the source.

I'm wondering if 3DM's PCIe bandwidth test would cause the issue to 'show'?
(Once had a bad x16 riser that'd only handshake x2-x4 lanes, *crash* on that test in an old PCIe 1.1 dual-K8 build. Removing the riser, resolved that issue)
Changing the PCI Express BIOS setting from auto to gen 3 was one of the many, many things I tried, yes. It had no change.

I disconnected and reconnected the graphics card more times than I can recall. I also put the old graphics card back in at times. The issue kept happening on the new one, but never on the previous one, so if dust or something was interfering, it was awfully coincidental to only do it with one of them, and every time, and then never with the other. I therefore think it's safe to say dust was unlikely.

The only three things I found that had any impact on the severity were...

1. The graphics card itself. The Old GTX 1060 never did it at all, the first 7800 XT did it a lot, and the replacement 7800 XT was not doing it... but then has since done it once. The fact that the level of severity greatly changes with different graphics cards signifies to me that the graphics card is at least one major variable.

2. RAM profile speeds. Stock is more unstable, XMP is less unstable. That's backwards from what I'd expect, if anything.

3. Undervolting the CPU. It's even less stable when undervolted (this one at least makes sense).

The above two conflict with the first and signifies a platform-side instability? Yet said instability is only there with some graphics cards, and worse with others? I'm not sure what to make of it and they seem to have conflicting directions they point. At least to me.

It sounds like the thread starter has a rather similar situation to my own. In the thread I made for my issue, two others who had the issue had it resolved with an RMA of the graphics card (a third had the issue but never followed up). I also had it resolved with an RMA too... until I noticed maybe I actually didn't. But I definitely saw a change in the level of severity at least? So that's three for four on at least seeing a difference in severity by changing the graphics card. So I'd say the graphics card is most likely the first thing to suspect as the cause for the thread starter. Especially if they tried five (!?) PSUs as that sort of rules that out in my mind. The RAM/platform-side stuff is the wild card here.

I won't lie, I was in the same boat as the thread starter (might still be?). I was so frustrated at the mixed signals of what could be causing it (is it the graphics card or the platform, it has to be one of them?) and as a result wanting to throw the whole PC out and start anew with the "other" brands of each of those things, even though I don't want to (because I like AMD's CPUs more right now, and I really dislike nVidia due to their current offerings right now). But after months of instability, data loss, and sunk cost from shipping stuff all the time, you really start getting fluffed up.

Update:

i found out pci-e slot is problematic or gpu itself PCI-E slot is too wide or GPU pins are too thin

pci e slot too loose that gpu can tilt up and down inside

so i use little pressure on gpu and lean it to cpu side I fixed it that way with the help of screws

Looks like the problem is solved!

Probably the GPU pins and PCI-E slot pins were not making solid contact with each other. and this caused a momentary power loss and the computer to restart.

However, I still need to playtest like this for at least a few days.

Those who experience a similar problem should try this method before changing the GPU or motherboard.

this problem can be solved also using a gpu holder like this:

View attachment 342976
Wow, of all things!? And thank you for the follow up! I'll have to keep this in mind.

I did notice some others reporting this issue mentioned that they would have the crash once at the start of a session, but never again (people were theorizing RAM was getting unstable while warming up). That would track if it was due to something during a heating cycle. For me though, I could sometimes have it occur only once every three or four days, or sometimes multiple times a day back to back.

So I'm not sure how likely it is in my case that this was/is my fault. I was only having the issue with one graphics card (though both of the 7800 XTs have a support brack and the GTX 1060 did not, which could be relevant), and more confusingly, why on Earth would CPU voltage and RAM speed settings impact it so much then? And why is my current one never doing it except in a specific version of one game (so far) and not another version? That seems... oddly specific if it's just a loose contact?

But this whole thing hasn't made sense so anything making partial sense is worth investigating.

In any case, it seems to have helped for you at least so I'm glad, and you gave me another unusual thing to try.
 
Joined
Jan 14, 2019
Messages
15,311 (6.77/day)
Location
Midlands, UK
System Name My second and third PCs are Intel + Nvidia
Processor AMD Ryzen 7 7800X3D
Motherboard MSi Pro B650M-A Wifi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance EXPO DDR5-6000 CL36
Video Card(s) PowerColor Reaper Radeon RX 9070 XT
Storage 2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s) Dell S3422DWG 34" 1440 UW 144 Hz
Case Kolink Citadel Mesh
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply 750 W Seasonic Prime GX
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE Plasma
So, with the CPU at stock, the system restarts. With Core Performance Boost enabled, it works fine.

1. Why don't you just leave CPB enabled?

2. Why would you turn off boosting in the first place?

3. My suspicion is that without boosting, your motherboard supplies the CPU with too low voltage to stay stable. I might be wrong, though.
 
Joined
Aug 25, 2023
Messages
446 (0.77/day)
System Name No.1
Processor Ryzen 9 9900X with custom PBO + 2200 FCLK fully stable
Motherboard B650 Gigabyte Aorus Elite v1.0
Cooling Thermaltake toughair 710 + Thermal Grizzly Kryonaut extreme
Memory Patriot Viper PVV532G740C36K @ 6200MT/s 30-36-36-63 1:1
Video Card(s) Asus TUF gaming RX 7900 XTX OC edition
Storage 1TB T-Force Z44A7 + 2TB T-Force A440 Pro
Display(s) 34 " Asus TUF Gaming VG3A series
Case Antec C8 constellation white edition
Audio Device(s) Asus Xonar AE 7.1 + Logitech Z906
Power Supply Corsair RM1000x V2
Mouse MSI Clutch GM20 Elite
Keyboard Logitech G512 Carbon
Update:

i found out pci-e slot is problematic or gpu itself PCI-E slot is too wide or GPU pins are too thin

pci e slot too loose that gpu can tilt up and down inside

so i use little pressure on gpu and lean it to cpu side I fixed it that way with the help of screws

Looks like the problem is solved!

Probably the GPU pins and PCI-E slot pins were not making solid contact with each other. and this caused a momentary power loss and the computer to restart.

However, I still need to playtest like this for at least a few days.

Those who experience a similar problem should try this method before changing the GPU or motherboard.

this problem can be solved also using a gpu holder like this:

View attachment 342976
Thank you, this type of extra information is very revealing into the cause of your problem.
 

Atlas39

New Member
Joined
Apr 10, 2024
Messages
10 (0.03/day)
Plastic gets soft when it's warm; even a reinforced slot will 'flex' more when warm...
This makes a LOT of sense.

I'll be looking into getting/making one, myself. Thanks!

Thats my pleasure i am happy if i could help :)

Changing the PCI Express BIOS setting from auto to gen 3 was one of the many, many things I tried, yes. It had no change.

I disconnected and reconnected the graphics card more times than I can recall. I also put the old graphics card back in at times. The issue kept happening on the new one, but never on the previous one, so if dust or something was interfering, it was awfully coincidental to only do it with one of them, and every time, and then never with the other. I therefore think it's safe to say dust was unlikely.

The only three things I found that had any impact on the severity were...

1. The graphics card itself. The Old GTX 1060 never did it at all, the first 7800 XT did it a lot, and the replacement 7800 XT was not doing it... but then has since done it once. The fact that the level of severity greatly changes with different graphics cards signifies to me that the graphics card is at least one major variable.

2. RAM profile speeds. Stock is more unstable, XMP is less unstable. That's backwards from what I'd expect, if anything.

3. Undervolting the CPU. It's even less stable when undervolted (this one at least makes sense).

The above two conflict with the first and signifies a platform-side instability? Yet said instability is only there with some graphics cards, and worse with others? I'm not sure what to make of it and they seem to have conflicting directions they point. At least to me.

It sounds like the thread starter has a rather similar situation to my own. In the thread I made for my issue, two others who had the issue had it resolved with an RMA of the graphics card (a third had the issue but never followed up). I also had it resolved with an RMA too... until I noticed maybe I actually didn't. But I definitely saw a change in the level of severity at least? So that's three for four on at least seeing a difference in severity by changing the graphics card. So I'd say the graphics card is most likely the first thing to suspect as the cause for the thread starter. Especially if they tried five (!?) PSUs as that sort of rules that out in my mind. The RAM/platform-side stuff is the wild card here.

I won't lie, I was in the same boat as the thread starter (might still be?). I was so frustrated at the mixed signals of what could be causing it (is it the graphics card or the platform, it has to be one of them?) and as a result wanting to throw the whole PC out and start anew with the "other" brands of each of those things, even though I don't want to (because I like AMD's CPUs more right now, and I really dislike nVidia due to their current offerings right now). But after months of instability, data loss, and sunk cost from shipping stuff all the time, you really start getting fluffed up.


Wow, of all things!? And thank you for the follow up! I'll have to keep this in mind.

I did notice some others reporting this issue mentioned that they would have the crash once at the start of a session, but never again (people were theorizing RAM was getting unstable while warming up). That would track if it was due to something during a heating cycle. For me though, I could sometimes have it occur only once every three or four days, or sometimes multiple times a day back to back.

So I'm not sure how likely it is in my case that this was/is my fault. I was only having the issue with one graphics card (though both of the 7800 XTs have a support brack and the GTX 1060 did not, which could be relevant), and more confusingly, why on Earth would CPU voltage and RAM speed settings impact it so much then? And why is my current one never doing it except in a specific version of one game (so far) and not another version? That seems... oddly specific if it's just a loose contact?

But this whole thing hasn't made sense so anything making partial sense is worth investigating.

In any case, it seems to have helped for you at least so I'm glad, and you gave me another unusual thing to try.

I continued the tests for a few hours and yes, the problem is 100% caused by this reason. I returned the GPU to its previous position and the restart problem started again. I leaned it up again and fixed it and the problem was solved. I tried many times to make sure.

I also tried to test the reason why this problem is random. PCI-E slots are generally made of plastic and become loose as they heat up. Therefore, in operations requiring high GPU power, the socket gets hotter and becomes looser. This causes lack of contact between the pins.

I also researched the reason why this problem does not occur in every GPU.
Maybe the part of the GPU that goes into the slot may be a little worn or produced incorrectly. It could be just like the "silicon lottery" situation with CPUs

The increase in stability when overclocking may be a pure coincidence. Or, in CPU intensive games, the GPU socket may not be softening because it does not get too hot. It may even be a case of electrical currents, I'm not sure why this is the case.

Almost everyone told me that this problem was caused by the PSU, so I tested it with 5 different PSUs and changed the RAMs.
And as the last step, I was about to build a new PC, when by chance I noticed that the GPU was loose in the slot.
please try this also if you still having the same issue
maybe it will help

thank you so much and i am so happy if its helps
So, with the CPU at stock, the system restarts. With Core Performance Boost enabled, it works fine.

1. Why don't you just leave CPB enabled?

2. Why would you turn off boosting in the first place?

3. My suspicion is that without boosting, your motherboard supplies the CPU with too low voltage to stay stable. I might be wrong, though.

it wasnt stable when cpb enabled it was more stable than stock still was crashing but less frequent


@Launcestonian

You are most welcome and thank you too for your advices and efforts
 
Joined
Jan 1, 2012
Messages
438 (0.09/day)
So, with the CPU at stock, the system restarts. With Core Performance Boost enabled, it works fine.

1. Why don't you just leave CPB enabled?

2. Why would you turn off boosting in the first place?

3. My suspicion is that without boosting, your motherboard supplies the CPU with too low voltage to stay stable. I might be wrong, though.
I think you might be asking the thread starter as opposed to me, but I'll answer since it seemed like we were both in the same boat here (and/or in case you are referring to me).

While it was more unstable at stock, it was still unstable either way.

And I never had a desire to run with certain performance features disabled nor at stock RAM speeds. I only tried changing these things since I first had the instability, and one of the first things you do when having instability issues is to test at stock. I imagine this is why the thread starter changed the things too.

I continued the tests for a few hours and yes, the problem is 100% caused by this reason. I returned the GPU to its previous position and the restart problem started again. I leaned it up again and fixed it and the problem was solved. I tried many times to make sure.

I also tried to test the reason why this problem is random. PCI-E slots are generally made of plastic and become loose as they heat up. Therefore, in operations requiring high GPU power, the socket gets hotter and becomes looser. This causes lack of contact between the pins.

I also researched the reason why this problem does not occur in every GPU.
Maybe the part of the GPU that goes into the slot may be a little worn or produced incorrectly. It could be just like the "silicon lottery" situation with CPUs
Well, that's wonderful news. I was thinking "I hope they don't come back and say false hope it happened again after all" because that happened to me a lot. I thought I resolved it and then my hopes were crushed. Sometimes it would take upwards of a week to occur so there was a lot of false hope scenarios for me.

In my case it's... sort of been resolved? I think? The issue was gone since the RMA but then one particular use case had it happen again. It's strange as to why that one use case in particular still causes it but so far no others do, but if I decide to investigate more, your findings do have a lot of reasoning behind them so I'm somewhat hopeful.

The funny thing is, there were three things that crossed my mind while I was doing my own five million attempts. One was to try the other PCI Express slot (not enough room), and the other was to try my old motherboard (never got around to it), and the other was trying with my PC on its side. There's a chance any of those may have resolved my issue if this is what was causing it for me.

And if nothing else, your issue is resolved at least and that's what matters for this thread (and I hope it stays that way!)
 
Status
Not open for further replies.
Top