Black screens leading to restarts (Event ID 18) on AMD platform since changing graphics card

Princess Garnet · Nov 16, 2023

Hello! I went to register to seek feedback for an issue and troubleshooting I'm doing surrounding it only to find I had an account from a decade ago already. Funny. Anyway...

As the title says, I'm having issues. I've been "doing this" for close to two decades, but I'm never afraid to admit when I need help or might be wrong. And this one is twisting me up. It has what seems to be a likely cause, but I would like to get second opinions because there's things making me second guess myself. This will be a long one and I apologize! I have a summary at the top and I'll try and break it up and format it the best I can.

Summary of Symptoms:

The display will go Black, and the PC will restart. The time between display going Black and restarting varies, but there's some consistency depending on what situation led to it. There's no signs of the video drivers crashing. Not in Event Viewer, nor from AMD's Adrenalin software itself. The drivers are not uninstalling themselves nor is the device being lost on next Windows restart like some others users seem to report when this occurs. In fact the drivers seem solid, but they do tell me "default tuning performance settings have been restored due to an unexpected system failure" after such an issue occurs. There are no BSODs, and no minidumps or memory dumps being created either (yes, my page file is enabled and on system managed on my system drive, and I have automatic restart on BSOD disabled). Event Viewer does shows Event ID 18 however, which seems to be an AMD specific event logged in case of a machine check exception which reads as "a fatal hardware error has occurred". WHEA and WatchDog logs are also being created sometimes. More on the logs and details below. This started when changing my video card.

Hardware:

CPU-Z Link: https://valid.x86.fr/7s64nw

Case: Fractal Arc Midi R2
PSU: EVGA SuperNova 750 G5
CPU: AMD Ryzen 7 5800X3D
CPU Cooling: Be Quiet Dark Rock Pro 4
Motherboard: MSI Mag X570S Tomahawk Max WiFi (BIOS V1.8)
RAM: 64 GB (4x 16 GB) G.Skill Ripjaws V 3,600 MHz 1.35V
GPU: Sapphire Nitro+ Radeon RX 7800 XT
Storage: 2x Western Digital Black SN850X 2TB, 1x Western Digital Black 5 TB HDD, 2x Western Digital Blue 8 TB HDD
OS: Windows 10 Home 22H2 (19045.3693)

Detailed description of issues:

As stated above, the PC display will sometimes go Black and then restart to the BIOS. This has occurred under the following conditions.

1. Playing Minecraft Java with shaders. Sometimes it just happens during play, but routinely it's when I press F11 which initiates a change from full screen to Window, or within seconds of doing that. One time it successfully switched to window mode only to fail when attempting to render a windows explorer window.

2. Playing League of Legends.

3. Other light games (Aura Kingdom is one).

4. At the immediate start of attempting to do OCCT's "GPU variable" test. Unfortunately, this happened only once and isn't reproducible. I thought it might be the first time it happened, but it wasn't. No other OCCT tests, and no other stress tests period (including Furmark) have failed on me yet.

Most of the restarts happen rather quickly after the screen goes Black. I notice the first one, Minecraft in particular, tends to take longer for the restart to occur. Sometimes, it doesn't restart and I have to force the PC off... but I notice an odd thing even about this. My case has a fan speed control with a selection for 5V, 7V, and 12V. I often run these at 7V for noise reasons. The first time I went to force power it off, I accidentally switched the voltage from 7V to 12V dsue to it being near the power button, and when this was accidentally switched it triggered the restart. I thought it was coincidental... until this "Black screen to not automatic restart" happened again... so I let it sit to see what would happen, and it never restart, so I switched the voltage intentionally... and it restarted? Hm.

I'm not sure if that's important to mention or "fluff" but I want to be thorough.

My first step of troubleshooting is "if a new symptom arrived, what change coincided with said symptom" and that change was the graphics card. So that's it, right? That's my suspicion too, but I wanted to rule things out regardless. And I can't help but notice a few things.

In my troubleshooting (summarized below for formatting reasons), I found the issue seems to occur much more often when my CPU is using stock BIOS settings (read as, JEDEC RAM speeds and voltages) as opposed to my RAM profile speeds. Huh? Backwards from what I would expect because these "heavier" RAM setting is more stable. I first noticed the issue in Minecraft around a week after getting the video card. But I ignored it at first, as it coincided with an undervolt attempt on the CPU. So I figured I just didn't win the lottery and couldn't undervolt at all. But it happens at stock. In other words, XMP RAM speeds is unstable, XMP RAM speeds with a CPU undervolt is very unstable, and JEDEC RAM speeds with no CPU undervolt is equally unstable. I hope that makes sense, but the point is... despite the issue occurring with the video card change, I'm noticing a correlation based on platform settings as well. And it's calling my sanity into question on if it was ever stable, despite never having issues with my previous GTX 1060. I am running what I believe might be a heavy memory configuration (four DIMMs of dual rank)... but then why is if less stable at seemingly more tame RAM settings!?

Before I move on to the list of things I've tried, here's a summary of some of WHEA logs and Watch Dog logs. If the logs themselves would be helpful, please ask.

The Event Viewer always shows this under "Event ID 18".

"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 1

The details view of this entry contains further information."

The APIC ID, which correlates to the logical CPU that threw the MCE, always differs.

WHEA logs always look like this.

"WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffff800474797900, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000bea00000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000108, Low order 32-bits of the MCi_STATUS value."

And the Watch Dog logs are giving me these.

"VIDEO_TDR_TIMEOUT_DETECTED (117)
The display driver failed to respond in timely fashion.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
Arguments:
Arg1: ffffaf8baadd7460, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff800540e8670, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, The secondary driver specific bucketing key.
Arg4: 00000000000005a8, Optional internal context dependent data."

"VIDEO_ENGINE_TIMEOUT_DETECTED (141)
One of the display engines failed to respond in timely fashion.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
Arguments:
Arg1: ffffda880ec2e010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff806a19b8790, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, The secondary driver specific bucketing key.
Arg4: 000000000000111c, Optional internal context dependent data."

"VIDEO_MINIPORT_BLACK_SCREEN_LIVEDUMP (1b8)
User initiated miniport black screen live dump.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
User initiated miniport live dump for black screen scenarios.
Arguments:
Arg1: 0000000000000001, Blackscreen hotkey generated miniport black screen live dump
Arg2: 0000000000000000, Reserved.
Arg3: 0000000000000000, Reserved.
Arg4: 0000000000000000, Reserved."

"VIDEO_DXGKRNL_BLACK_SCREEN_LIVEDUMP (1a8)
User initiated DXGKRNL black screen live dump.
User initiated DXGKRNL live dump for black screen scenarios.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
Arguments:
Arg1: 0000000000000001, Blackscreen hotkey generated DXGKRNL black screen live dump
Arg2: 0000000000000000, Reserved.
Arg3: 0000000000000000, Reserved.
Arg4: 0000000000000000, Reserved."

These seem very suggestive of the video card and/or drivers? And there's a lot of "Black screen to restart" issues with the 7800 XT going on it seems, but the fact they may be happening doesn't necessarily tell me what the cause might be. The issue did show up with the video card change, and the issue only seems to show up when the video card is under use. I can use my PC all day on the internet (even with hardware acceleration browsers or watching video) or load Photoshop and putting the CPU and RAM under load. It never crashes... until the video card is used more than modestly.

What I've tried through troubleshooting:

I posted this on the Steam forums originally, so I'll copy that part from there.

1. I've updated the motherboard BIOS. Originally it was V1.5, then V1.7, and now V1.8.

2. Windows 10 is up to date.

3. AMD chipset drivers are up to date. Audio drivers are up to date. Ethernet drivers are up to date. Bluetooth and WiFi drivers are up to date. Etc.

4. I've updated video card drivers as new ones have become available. The issue has persisted on all drivers I've tried, including 23.9.1, 23.9.3, 23.10.1, 23.10.2, and 23.11.1.

5. I've used DDU to uninstall and reinstall the video drivers. Yes, I used safe mode. Yes, I disconnected the internet.

6. I've reset the BIOS who knows how many times.

7. I've disabled XMP, and I've set XMP but scaled back RAM frequency/IF clocks a bit to 3,200 MHz/1,600 MHz respective. So it doesn't matter RAM/IF is set to 2,133 MHz (JEDEC default)/1,066 MHz or 3,200 MHz/1,600 MHz or 3,600 MHz/1,800 MHz respectively, they all have the issue. This seems to rule out RAM or Infinity Fabric instability?

8. I've run stress tests galore. Windows memory diagnostic (might not be very conclusive on its own but I did it), MemTest86+, Prime 95, BurnInTest, and the majority of the OCCT suite. All passed, with the exception of the "GPU variable" test in OCCT, which immediately caused the crash the first time I attempted it, but then succeeded on a subsequent attempts.

9. I've tried connecting the DP cable to both output ports on the video card (mine has two DP and two HDMI instead of three DP and one HDMI).

10. I've tried HDMI.

11. I've adjusted the ASPM setting (PCI Express > Link State Power Management > Off).

12. I've completely reinstalled Windows 10!

13. I've completely, and I mean completely, took my PC apart down to the part, cleaned it (though it was already rather clean), and reassembled it. This was to rule out a bad connection anywhere. I even swapped RAM around, and the CPU was also reseat.

14. The video card is a Sapphire Nitro+ RX 7800 XT which has a BIOS switch with three positions (one performance BIOS, one silent BIOS, and the other is just a mode that lets you change it on the fly with the Sapphire TriXX software). I've tried both BIOS/all three positions.

15. I've used "Driver Verifier" which is something Windows includes and followed the instructions here to stress test the drivers. This was inconclusive, but not entirely useless. Since the issue doesn't yet have a known reproducible, on demand cause, I have to wait, but this tends to cause it to occur sooner. Unfortunately, the Driver Verifier does not catch anything and give me a notice of any violations it detected. Maybe because the drivers are fine and the issue isn't drivers but hardware itself. I'm reading machine check exceptions are, as a rule, almost always hardware and not software.

16. I've found some people saying they suspect the issues may be the card boosting above where it should at fringe moments. I've tried limiting the boost to 2,429 MHz. Nonetheless, it made no difference.

17. I've tried disabling ULPS.

18. I've tried disabling MPO.

19. I've tried my old 3700X in place of the 5800X3D. It happens on both. I think this can rule out the CPU(s) on a hardware level.

20. I've tried manually disabling PBO in the BIOS instead of leaving it on Auto.

21. Temperatures have been monitored, and nothing seems to get to critical levels (CPU can spike high but it's always below 90C, often below 80C or even 70C, and the GPu is in the 50C or 60C range with hot spot often under 80C and never over 90C). To the contrary, it happens even in mild games with very low temperatures, and I even tested with the side panel off.

None of these troubleshooting steps have resolved the issue.

The sole troubleshooting step that has resolved this is removing my RX 7800 XT and putting the GTX 1060 back in. After confirming that resolved the issue, I then put the RX 7800 XT back in and tried a few more of my steps above (DDU, fresh drivers, limiting boost speeds in Adrenalin) and it's still happening.

Conclusion:

I apologize for showing up out of the Blue after a decade and dropping such a long post! I wanted to get second opinions to make sure I'm on the right track, or see if I'm oblivious to something someone who is much smarter than me suspects or knows about this.

At this step, I believe I almost have to try an RMA on the GPU? I have this worry this might not resolve it but that's merely a feeling. Maybe I need to set this aside and cross that bridge when i get there, so my plan now is to reach out to Sapphire for support.

My CPU (though I think I ruled this out), motherboard, and of course video card are under warranty.

I think my RAM might be, as it has a "lifetime" warranty but it is three and a half years old so maybe it's not.

The PSU is... complicated. It's technically under warranty until early next year, but EVGA gave me a brand new G5 I RMA'd near the end of the ten year warranty term to replace a G2 I had with a faulty fan (and I found the G5 is a slight downgrade but I don't know, and the G2 isn't made any more), so I wouldn't want to RMA this unless I first tried to at least RMA the part that caused the issue to show up. Honestly I'd probably just buy a new one but I'd only do this if someone was like... convinced it stood a good chance of fixing this. As the issue doesn't seem to correlated with high draw, I don't think I'm tripping OPP or OCP, but PSUs aren't my specialty.

And as a wild card, I have my prior motherboard, an Asus ROG Strix B550-F Gaming. I want to avoid having to swap that in if at all possible though due to the level of effort in entails, and because I also had separate issues with it during the time I used it (ask if you want details, but this thread is already so long I'd rather stick to this issue). I RMA'd it when a formal fault was discovered after buying my two M2 SSDs and finding one was not functional at all. Asus was slow to deal with and it cost me $50 to RMA a motherboard being sent one state away... not fun. I didn't even wait on the return and just bought the MSI to replace it (partly to reduce downtime and partly use PCI Express speeds in the second M2 port). Funny enough, the initial MSI I tried to buy never even succeeded in POSTing. I had to return that to Micro Center and the second one worked. Starting to wonder if I have a deeper issue here? But it was stable until the GPU was swapped. This one is spinning me in circles...

Any help would be greatly appreciated! I'm so desperate I'll gift you a (sanely priced) game on Steam if you can figure this one out for me. I just want it working. Imaging spending $600 to play Minecraft with shaders better and it leads to a nightmare and has you second guessing if the system was ever stable or if it's just a bad new part. It's sooo depressing. Anyway I'm going to try to RMA the GPU but if that never even gets that far or if Sapphire says it's fine, or return a new one and it also has the issue... then I won't know what to do. I feel like I've exhausted what i can and would be guessing at buying new parts at that point. But it's not worth buying new AM4/DDR4 stuff when I wanted to move to AM5 when the Zen 5 X3D launches so this would mess that up.

Super Firm Tofu · Nov 16, 2023

Pretty sure it's a bad card. I fought with my Nitro+ 7900XTX for almost a month. Returned it and bought a Red Devil and it's been fine since.

My eyes kind of glazed over a few chapters in, but if you haven't already, take the card out and reseat back in the slot. Same with the PCIe power cables.

Princess Garnet · Nov 17, 2023

Thanks for the opinion. And that's reassuring that it resolved it for you because online there's a lot of feedback about users having these Black screen to reboot issues, particular on 7000 series graphics cards (but almost always on an AMD CPU platform too). Some people did RMA and found success, others simply went to nVidia or their prior GPU and found it resolved it. Others changed some other part and found success and others changed that same part and didn't. This issue doesn't seem to have a single cause but it's seemingly happening a lot on AMD CPU platforms plus recent AMD GPUs in particular. I'm unfortunately past the return window but I'd rather keep this anyway since it fits my needs much better than an RTX 4070 would have.

My gut reaction is saying this points to the video card as well, since the logs suggest it (but maybe I'm interpreting them wrong), and it showed up after adding it... but the behavior is also impacted based on RAM/voltage settings on the platform side and it has me wondering if maybe I was sitting on a borderline unstable system all along that the GTX 1060 just... wasn't exposing for whatever reason.

I did reseat the graphics card. I reseated everything by taking the PC entirely apart and reassembling it. Swapped RAM around, even swapped SATA cables for spares. Like the lengths I've gone to over this. I'm firmly at the "I've almost tried everything but part swapping" and I've even done a bit of that, so I need to explore what is most likely the cause to decide where I start any RMA adventures.

Beginner Macro Device · Nov 17, 2023

0. You have no need to apologise for long posts. These long reads are way better than "Hi guys I got a problem my PC crashes pls help" nonsense. At least you let us know what is going on.
1. Most likely your GPU has incorrect power states switching which ultimately causes unstable behaviour. Since it behaves well whilst loaded at the fullest you could make a burn-in test: run anything that makes your GPU use its full potential (e.g. mining software, Furmark 720p test etc) in the background whilst playing your games that make your system crash. And I'm 99% sure this will stop this nuisance.
What makes me more sure this is the case is you have less problems when your RAM is overclocked. This probably means your GPU is underloaded and, considering power management is incorrect, is getting inadequate power, thus crashes.
2. ...but it could be another way around. Check if forcing PCI-E 3.0 solves your issue. If this is the case your GPU has defective PCI-E lanes or die. Or maybe it's something off with your motherboard (I highly doubt that BOTH your CPUs are unstable in this regard, too rare).

Regardless it's a defective GPU. I grant you 99.999999999% on that. Steps 1 and 2 are meant to find out what exactly is wrong.

A Computer Guy · Nov 17, 2023

Regarding the RAM you are running (generally 64GB in 4 sticks, assuming dual rank sticks) you likely need more SOC voltage, sometimes the right amount is not supplied. What is your SOC voltage when XMP is applied? You can use ZenTimings to get this information so set your RAM back to XMP and screenshot this info back here. Try adjusting your SOC voltage to 1.1 and see if that stabilizes your ram. Ram instability can pretty much destroy any program you are running including video card drivers and also lead to a corrupted OS to further complicate matters. It can also cause a corrupted UEFI/BIOS flash. If your RAM stabilizes with the new SOC voltage I would reflash your UEFI/BIOS just to be sure it's not messed up. If the ram does not stabilize remove 2 sticks A1, B1 and retry default XMP and test for stability.

If the PSU is NOT delivering good power for some reason then a lot of different things can go wrong especially when your GPU starts drawing power. Swap out the PSU. Retest for stability.

Are the fans 3 pin voltage controlled or 4 pin pwm? Unplug the fans for now. Retest for stability.

Psychoholic · Nov 17, 2023

Way back near launch day when i got my 7900xtx i had similar issues.
Mine were resolved by disabling freesync.. I have a GSYNC (hardware module) monitor so it may or maynot help your situation.

Princess Garnet · Nov 18, 2023

Beginner Micro Device said:
0. You have no need to apologise for long posts. These long reads are way better than "Hi guys I got a problem my PC crashes pls help" nonsense. At least you let us know what is going on.
1. Most likely your GPU has incorrect power states switching which ultimately causes unstable behaviour. Since it behaves well whilst loaded at the fullest you could make a burn-in test: run anything that makes your GPU use its full potential (e.g. mining software, Furmark 720p test etc) in the background whilst playing your games that make your system crash. And I'm 99% sure this will stop this nuisance.
What makes me more sure this is the case is you have less problems when your RAM is overclocked. This probably means your GPU is underloaded and, considering power management is incorrect, is getting inadequate power, thus crashes.
2. ...but it could be another way around. Check if forcing PCI-E 3.0 solves your issue. If this is the case your GPU has defective PCI-E lanes or die. Or maybe it's something off with your motherboard (I highly doubt that BOTH your CPUs are unstable in this regard, too rare).

Regardless it's a defective GPU. I grant you 99.999999999% on that. Steps 1 and 2 are meant to find out what exactly is wrong.

Thank you for the understanding! Trying to include enough information but not include what may be unnecessary fluff that scares people away is always a delicate balancing act. I just don't even know what might be relevant or not since this issue is so confusing (unless it's just simply "RMA and that fixes it").

Your first part reminds me of something. If I have a game running, say Minecraft and even if it was just launched and then paused, but if I switch it to window mode then start doing something in a browser, it's almost guaranteed I will cause crash within 5 to 15 minutes. Playing a game alone can cause it, but not that fast, and just browsing the web never causes it. Both together? Seems to happen real quick.

For the second point, I'll give this a try! I actually saw this suggested on some Reddit posts but at this point I've felt so defeated by basically trying everything but this (and trying less of my RAM) and none of it worked. No harm in trying at this point.

A Computer Guy said:
Regarding the RAM you are running (generally 64GB in 4 sticks, assuming dual rank sticks) you likely need more SOC voltage, sometimes the right amount is not supplied. What is your SOC voltage when XMP is applied? You can use ZenTimings to get this information so set your RAM back to XMP and screenshot this info back here. Try adjusting your SOC voltage to 1.1 and see if that stabilizes your ram. Ram instability can pretty much destroy any program you are running including video card drivers and also lead to a corrupted OS to further complicate matters. It can also cause a corrupted UEFI/BIOS flash. If your RAM stabilizes with the new SOC voltage I would reflash your UEFI/BIOS just to be sure it's not messed up. If the ram does not stabilize remove 2 sticks A1, B1 and retry default XMP and test for stability.

If the PSU is NOT delivering good power for some reason then a lot of different things can go wrong especially when your GPU starts drawing power. Swap out the PSU. Retest for stability.

Are the fans 3 pin voltage controlled or 4 pin pwm? Unplug the fans for now. Retest for stability.

This was funny enough something I was thinking of adding to the original post but left it to "I'll supply it if someone asks" since the original post was already long.

Yes, these are all dual rank DIMMs.

Here's what Ryzen Master shows.

The BIOS and other monitoring programs (like Libre Hardware Monitor or HwInfo64) show a value closer to 1.088V for the SOC. I did try raising this by using a positive offset of 0.0125V which caused other things to show 1.1V (Ryzen Master still showed 1.1V too so it's likely using a less precise rounding). This made no change to the issue, unfortunately. I'm open to exploring this a bit but I don't know what I'm doing with these voltages, and I've read some of them have "ranges" where too high in relation to another can also lead to instability so... I haven't tried changing anything else. Especially since the X3Ds are supposed to be voltage sensitive.

The RAM voltage is 1.35V (HwInfo64 shows 1.36V).

I thought the 1T command rate might be ambitious so I tried 2T but this didn't help.

I tried 3,200 MHz/1,600 MHz Infinity Fabric at the same voltages/timings as above but it also didn't help.

With XMP off (where it seems less stable), I think the RAM voltage goes down to 1.2V and the SOC drops too, but I can't remember if it's 1V or 1.05V. It's one of those two. Hm, maybe my issue is with RAM or SOC voltage?

Try with less DIMMs is something I should also try but I have something to mention here. When I use less all of my DIMMs, XMP doesn't allow the system to POST at all. JEDEC speeds do. At least... that was the behavior on my prior motherboard. I never tried on this one. Time to find out. I will be going to try with less RAM after this post (someone mentioned it on the Steam forums too so it was in the back of my mind to try eventually, but I didn't get to it yet).

Edit: Sorry I missed the question about the fans. The are some 140mm Phanteks ones. These ones.

https://www.newegg.com/phanteks-ph-f140sp-bk-case-fan/p/N82E16835709023

They are 3 pin fans connected to the case fan speed connector which then converts to molex, which connects to the PSU. This crash does occur whether these are initially running at 7V or 12V. I'll try disconnecting them too.

Psychoholic said:
Way back near launch day when i got my 7900xtx i had similar issues.
Mine were resolved by disabling freesync.. I have a GSYNC (hardware module) monitor so it may or maynot help your situation.

Thank you for the suggestion, but alas, no fancy stuff here as I'm on an ancient U2410. I want to look into getting 1440p and higher refresh later but right now that's sidelined.

A Computer Guy · Nov 18, 2023

Princess Garnet said:
Thank you for the understanding! Trying to include enough information but not include what may be unnecessary fluff that scares people away is always a delicate balancing act. I just don't even know what might be relevant or not since this issue is so confusing (unless it's just simply "RMA and that fixes it").

Your first part reminds me of something. If I have a game running, say Minecraft and even if it was just launched and then paused, but if I switch it to window mode then start doing something in a browser, it's almost guaranteed I will cause crash within 5 to 15 minutes. Playing a game alone can cause it, but not that fast, and just browsing the web never causes it. Both together? Seems to happen real quick.

For the second point, I'll give this a try! I actually saw this suggested on some Reddit posts but at this point I've felt so defeated by basically trying everything but this (and trying less of my RAM) and none of it worked. No harm in trying at this point.

This was funny enough something I was thinking of adding to the original post but left it to "I'll supply it if someone asks" since the original post was already long.

Yes, these are all dual rank DIMMs.

Here's what Ryzen Master shows.

The BIOS and other monitoring programs (like Libre Hardware Monitor or HwInfo64) show a value closer to 1.088V for the SOC. I did try raising this by using a positive offset of 0.0125V which caused other things to show 1.1V (Ryzen Master still showed 1.1V too so it's likely using a less precise rounding).

I wouldn't use the offset just the actual SOC value. Be careful not to change the Chipset SOC that is a different voltage just leave that one alone if you see it. The SOC voltage can be at or under 1.2v. The sweet spot is said to be between 1v and 1.1v when overclocking. SOC over 1.2v risks damage.

Princess Garnet said:
This made no change to the issue, unfortunately. I'm open to exploring this a bit but I don't know what I'm doing with these voltages, and I've read some of them have "ranges" where too high in relation to another can also lead to instability so... I haven't tried changing anything else. Especially since the X3Ds are supposed to be voltage sensitive.

The RAM voltage is 1.35V (HwInfo64 shows 1.36V).

This happens. Depending on the board and what you enter you may get slightly different values.

Princess Garnet said:
I thought the 1T command rate might be ambitious so I tried 2T but this didn't help.

1T/2T doesn't matter when GearDown=Enabled, GearDown has been described like running 1.5T and helps with stability. You could try disabling GearDown and running 2T.

Princess Garnet said:
I tried 3,200 MHz/1,600 MHz Infinity Fabric at the same voltages/timings as above but it also didn't help.

With XMP off (where it seems less stable), I think the RAM voltage goes down to 1.2V and the SOC drops too, but I can't remember if it's 1V or 1.05V. It's one of those two. Hm, maybe my issue is with RAM or SOC voltage?

Yes JEDEC speeds will run DIMMS at 1.2v as far as I am aware. XMP is overclocking.

Princess Garnet said:
Try with less DIMMs is something I should also try but I have something to mention here. When I use less all of my DIMMs, XMP doesn't allow the system to POST at all. JEDEC speeds do. At least... that was the behavior on my prior motherboard. I never tried on this one. Time to find out. I will be going to try with less RAM after this post (someone mentioned it on the Steam forums too so it was in the back of my mind to try eventually, but I didn't get to it yet).

2 DIMMS should be installed in your primary ram slots A2, B2 unless your manual says otherwise. That is interesting your system won't post with 2 DIMMS set to XMP. Your RAM is from a 4 DIMM kit correct?
If that is still the case then run them JDEC and run PassMark MemTest86 and see if it detects a problem. Then replace the 2 DIMMS with the other 2 and repeat the test. Was your RAM listed on the motherboards QVL?

Princess Garnet said:
Edit: Sorry I missed the question about the fans. The are some 140mm Phanteks ones. These ones.

https://www.newegg.com/phanteks-ph-f140sp-bk-case-fan/p/N82E16835709023

They are 3 pin fans connected to the case fan speed connector which then converts to molex, which connects to the PSU. This crash does occur whether these are initially running at 7V or 12V. I'll try disconnecting them too.

Ok we can probably ignore the fan route that leads to nowhere. I thought they may have been connected to the board.

If MemTest86 reports errors it's possible something is wrong with your ram. I say possible because a bad configuration can cause memtest86 to fail as well although if it's failing at JEDEC speeds the ram may need to be replaced from your system. Maybe you have an incompatibility. If the 2 DIMM tests pass it would be good to MemTest86 with all 4 sticks in the system at XMP when your done testing the pairs.

After this I would focus on the PSU and use a different PSU, preferably one of known good quality.

(edit)

Sorry I missed you said "The sole troubleshooting step that has resolved this is removing my RX 7800 XT and putting the GTX 1060 back in. After confirming that resolved the issue, I then put the RX 7800 XT back in and tried a few more of my steps above (DDU, fresh drivers, limiting boost speeds in Adrenalin) and it's still happening." based on this I would put your RAM back in working order and focus on swapping out the PSU when using the RX 7800 XT. This will save you a lot of testing time.

Princess Garnet · Nov 18, 2023

Yes, the RAM is a kit of four. Specifically this one.

F4-3600C16Q-64GVKC - Overview - G.SKILL International Enterprise Co., Ltd.

Ripjaws V series DDR4 DRAM memory is designed for sleek aesthetics and performance, making it an ideal choice for building a new PC system or for upgrading your system memory.

www.gskill.com

The RAM is on the QVL of my motherboard, and likewise G.Skill lists my motherboard as one it's tested with.

I've set the motherboard BIOS "PCIE_1" setting from "auto" to "Gen 3", and disconnected the PSU cable that powers those three fans, and now I'm trying to see if the issue goes away with one of those two things changed. I'm testing right now by running Minecraft windowed in the background with a browser open since that seems likely to cause it fast (I'm chancing typing this and losing to to a crash) and surprisingly it didn't do so in fast order, but I feel like I'd need to go many weeks or maybe even a couple months with no issues under uses I've been too scared to try lately in order to rule it gone.

While doing this I noticed something else. With Minecraft open in a window state, certain other things seem to cause it to... flicker? Like opening Windows start menu and scrolling it leads to rapid flicker that matches the scrolling. Sometimes just randomly flashes sitting idle like that. If this sounds confusing I can try and get a video of it.

Two fans are connected to the motherboard. The rear exhaust fan is one of them, and a new fan on top is the other (this was added after the issues started and is thus not a factor I believe). The other top fan and the two intake fans are the one connected to the case fan control.

Anyway, just so I'm clear on things...

1. You think investing time and effort into the CPU/RAM side of things isn't worthwhile at least for now after seeing my statement on the GPU?

2. You think I should chance buying a new PSU before RMAing the video card? If buying a new PSU will fix it I'd jump and do it no question asked, but I'm not sure if spending money on a "maybe" is smart before exhausting an RMA on the actual part change that introduced the behavior? If that mindset is wrong, I'm open to why. Obviously a new PSU isn't immune to faults, but this PSU was an RMA replacement for an old G2 and was in "new" condition and is maybe a year old at best.

A Computer Guy · Nov 18, 2023

Princess Garnet said:
Yes, the RAM is a kit of four. Specifically this one.

F4-3600C16Q-64GVKC - Overview - G.SKILL International Enterprise Co., Ltd.

Ripjaws V series DDR4 DRAM memory is designed for sleek aesthetics and performance, making it an ideal choice for building a new PC system or for upgrading your system memory.

www.gskill.com

The RAM is on the QVL of my motherboard, and likewise G.Skill lists my motherboard as one it's tested with.

I've set the motherboard BIOS "PCIE_1" setting from "auto" to "Gen 3", and disconnected the PSU cable that powers those three fans, and now I'm trying to see if the issue goes away with one of those two things changed. I'm testing right now by running Minecraft windowed in the background with a browser open since that seems likely to cause it fast (I'm chancing typing this and losing to to a crash) and surprisingly it didn't do so in fast order, but I feel like I'd need to go many weeks or maybe even a couple months with no issues under uses I've been too scared to try lately in order to rule it gone.

While doing this I noticed something else. With Minecraft open in a window state, certain other things seem to cause it to... flicker? Like opening Windows start menu and scrolling it leads to rapid flicker that matches the scrolling. Sometimes just randomly flashes sitting idle like that. If this sounds confusing I can try and get a video of it.

Two fans are connected to the motherboard. The rear exhaust fan is one of them, and a new fan on top is the other (this was added after the issues started and is thus not a factor I believe). The other top fan and the two intake fans are the one connected to the case fan control.

Anyway, just so I'm clear on things...

1. You think investing time and effort into the CPU/RAM side of things isn't worthwhile at least for now after seeing my statement on the GPU?

Yes, you clearly stated you weren't having these problems with your prior graphics card.

Princess Garnet said:
2. You think I should chance buying a new PSU before RMAing the video card? If buying a new PSU will fix it I'd jump and do it no question asked, but I'm not sure if spending money on a "maybe" is smart before exhausting an RMA on the actual part change that introduced the behavior? If that mindset is wrong, I'm open to why. Obviously a new PSU isn't immune to faults, but this PSU was an RMA replacement for an old G2 and was in "new" condition and is maybe a year old at best.

The bottom line you need to be able to swap out parts to isolate the problematic component. We know the GPU and PSU are new. One of those parts are bad (we assume the GPU based on your GPU swap) or there is an incompatibility with the new parts in combination.

I had to look this up but it appears the PSU you have is a Tier B - Mid Range (https://cultists.network/140/psu-tier-list/) so I guess it should be fine.

In light of this RMA'ing the card makes sense if you don't want to spend any money at this point.

kapone32 · Nov 18, 2023

To the OP. This could be your PSU. I had a HX1200i for a few years and when I bought my 6800XT I started getting issues exactly like you describe. I changed the PSU to the Deepcool 1000W and it has been smooth ever since.

Klemc · Nov 18, 2023

Try Windows 11 !?

Princess Garnet · Nov 18, 2023

A Computer Guy said:
Yes, you clearly stated you weren't having these problems with your prior graphics card.

The bottom line you need to be able to swap out parts to isolate the problematic component. We know the GPU and PSU are new. One of those parts are bad (we assume the GPU based on your GPU swap) or there is an incompatibility with the new parts in combination.

Yes, right now I think that's where I'm at. I've done like 99% of the stuff I can think off on the software side so I need to swap things out. It's either the GPU, PSU, motherboard, or RAM in my mind (and maybe I suspect them in that order) since those are the only ones I haven't yet formally ruled out.

The GPU and PSU aren't the only new ones though. If we're going by timeline, both the CPU (which I've ruled out) and the motherboard are slightly newer.

Here's the history on my AM4 platform timeline, in case any of this is relevant. It might not be since this issue seems as straightforward as "new GPU has issue, so it might just be the GPU itself" but in case it's not...

Mid 2020:
Ryzen 7 3700X
ASUS ROG Strix B550-F Gaming WifFi
64 GB G.Skill Ripjaws V
EVGA SuperNova 750 G2 (purchased in early 2014 but it's relevant)

This all worked well initially. Started with whatever BIOS the motherboard came with, updated to 0802. Still fine. I started having an issue I mentioned above where the PC would sometimes spontaneously reboot a minute or two after loading into Windows, and only then. In the BIOS it was fine. If it made it past that point, it was fine. I thought it was something starting up causing it (but what could cause that?) but I couldn't find anything, and then I realized it happened after a BIOS update from 0802 to anything newer. I went back to 0802 and that indeed cleared it up. Odd. Usually spontaneous power cuts are PSU not coping but this was consistent with a certain time window (shortly after loading into Windows) and consistent with a certain BIOS so maybe it was motherboard/BIOS related?

Later the news of the 5800X3D is making its rounds and I considered maybe I'd want it eventually, but I'd need a new BIOS. Do I forgo the upgrade and stay with what works? I wouldn't have to weight those options long, as my PC was starting to Black screen and then freeze (not reboot like this one) on the BIOS that previously worked (just not when first starting, but later when under load). Motherboard was throwing a DRAM LED during this. I was suspecting motherboard issues or RAM issues? And not long after that issue, it would fail to cold POST but if I retried right after this it would always POST fine. Odd? I was about to RMA the motherboard (RAM would be next) and... the issue goes away. Uh! Frustrating but whatever?

Middle to end of November 2022:
EVGA SuperNova 750 G2 sent in for RMA. Partially due to it happening along with the above to rule it out, but also because the fan was intermittently squeaking if it ran (I worked around this in the time I owned it by just... leaving it in eco mode where my PC never caused it to get warm enough to ever turn on). EVGA sends me a brand new G5 as replacement. Apparently it's a slight downgrade (?) but I'm not too bothered. It's still supposed to be "good"?

February 11th, 2023:
Ryzen 7 5800X3D replaces Ryzen 7 3700X (more on this below).

March 3, 2023:
2x Western Digital Black SN850X 2 TB purchased.

With the above purchase, I found my second/bottom M2 port was almost entirely non-functional. Now it was time to RMA the motherboard as I had a known, consistent issue that was for sure the motherboard, and hopefully all my other issues would never return.

March 17, 2023:
MSI MAG X570S Tomahawk Max WiFi purchased.

I decided that in order to reduce downtime, in order to get full speed out of the second M2 SSD, and perhaps to hope a new motherboard was better about a possible RAM instability, I'd just get a new X570 based board. Prior motherboard sent for RMA and moved to this one. The first one I purchased never passed POST. Returned it and the second one did.

My PC has been absolutely solid after that, until...

Septemeber 15th, 2023:
Sapphire Nitro Radeon RX 7800 XT replaces EVGA GTX 1060.

For around a half a week to a week it was fine. I then started having these Black screen to reboot issues, but they first showed up with a CPU undervolt attempt so I thought maybe they were tied to that. By time I figured they might be the GPU it was close to a month since I had it and there were things like "did you try newer or older drivers" or "did you try a fresh Windows" I had to do and... that put me past the return window.

I mentioned "more on that below" with the CPU swap above. What was that about? While investigating this current GPU issue and digging through the Event Viewer (looking through all the Event ID 18s which were "a fatal hardware error has occured"), I came to find there were also lesser ones from earlier this year. they were Event ID 19, or "a corrected hardware error has occured". I never noticed anything amiss so I never knew they were there. Timeline-wise, these would happening under my prior motherboard and first CPU, and stopped the very day before I swapped the 5800X3D in. I don't know what to make of that.

Hopefully it's just the GPU itself. I wouldn't even mind if it was the PSU. Hopefully it's not some underlying issue with the motherboard or RAM.

Anyway, time to reach out to Sapphire for support I guess. While doing that, if that fails to yield a solution, or if anyone thinks I should do it anyway/alongside it, does anyone have any suggestions for a good PSU (or should I ask in a seperate thread in the PSU forum)? I'd like to keep cost under $150 or so and I imagine a 750W to 850W one would be more than plenty? Modular, and no RGB. It would need a molex connector (or I could use a SATA to molex adapter for the case fans?).

nurgle · Nov 18, 2023

Very similar issue here, specs as opposite. Unstable in multiple games after buying a Sapphire Pulse 7800XT. OCCT GPU extreme will crash at 35s +- 2 or so every single time. Cinebench goes down every time almost immediately when GPU tested. CPU tests in both no problem. Put my RX 5700 back in and totally stable. There is a discrepancy in the reported GPU clocks (its on default settings) in AMD vs OCCT but I'm not sure if this is just a glitch.

RMA'd

A Computer Guy · Nov 18, 2023

Princess Garnet said:
My PC has been absolutely solid after that, until...

Septemeber 15th, 2023:
Sapphire Nitro Radeon RX 7800 XT replaces EVGA GTX 1060.

For around a half a week to a week it was fine. I then started having these Black screen to reboot issues, but they first showed up with a CPU undervolt attempt so I thought maybe they were tied to that. By time I figured they might be the GPU it was close to a month since I had it and there were things like "did you try newer or older drivers" or "did you try a fresh Windows" I had to do and... that put me past the return window.

One question. Are you using 1 PCIe power cable to your GPU with the 2nd daisy chained or are you using 2 PCIe power cables from your PSU?

Princess Garnet said:
Hopefully it's just the GPU itself. I wouldn't even mind if it was the PSU. Hopefully it's not some underlying issue with the motherboard or RAM.

I think as long as you handle one thing at a time you will be fine. I see TPU had a review on this card Sapphire Nitro+ Radeon RX 7800 XT seems like a nice card.

Princess Garnet said:
Anyway, time to reach out to Sapphire for support I guess. While doing that, if that fails to yield a solution, or if anyone thinks I should do it anyway/alongside it, does anyone have any suggestions for a good PSU (or should I ask in a seperate thread in the PSU forum)? I'd like to keep cost under $150 or so and I imagine a 750W to 850W one would be more than plenty? Modular, and no RGB. It would need a molex connector (or I could use a SATA to molex adapter for the case fans?).

I don't think you can go wrong with a Corsair RMx series RM750x or RM850x (which happens to be on the Tier-A list) but I'm biased because that is what I'm using in at least two of my own builds. There are people here much more knowledgeable on PSU's here than me.

Princess Garnet · Nov 19, 2023

nurgle said:
Very similar issue here, specs as opposite. Unstable in multiple games after buying a Sapphire Pulse 7800XT. OCCT GPU extreme will crash at 35s +- 2 or so every single time. Cinebench goes down every time almost immediately when GPU tested. CPU tests in both no problem. Put my RX 5700 back in and totally stable. There is a discrepancy in the reported GPU clocks (its on default settings) in AMD vs OCCT but I'm not sure if this is just a glitch.

RMA'd

Thanks for sharing! Though sorry you're in the same (?) boat I seem to be. This unfortunately seems a be a rather common issue with recent Radeon GPUs (especially the 7800 XT) on Ryzen platforms and I'm not sure what to make of it, since obviously not everyone is having issues with theirs. The 7800 XT was selling out a lot on launch. I missed the release bunch and got the second restock half a week-ish later. Hm, bad batch? Rushed supply due to demand? Maybe we'll know more once more of us RMA if we see a pattern in what happens. It's still early for the 7800 XT. Feel free to follow up here with your results with your RMA if you like.

A Computer Guy said:
One question. Are you using 1 PCIe power cable to your GPU with the 2nd daisy chained or are you using 2 PCIe power cables from your PSU?

I think as long as you handle one thing at a time you will be fine. I see TPU had a review on this card Sapphire Nitro+ Radeon RX 7800 XT seems like a nice card.

I don't think you can go wrong with a Corsair RMx series RM750x or RM850x (which happens to be on the Tier-A list) but I'm biased because that is what I'm using in at least two of my own builds. There are people here much more knowledgeable on PSU's here than me.

Yes, I'm using separate cables for each connector. My PSU has four VGA cable outputs and four cables; two are just single connectors on each end and the other two are the split connectors on the GPU end. I thought of swapping them out when doing my disassembly and reassembly the one day, and ensuring I still used two cables, but I never did.

And yes, it really is a nice graphics card. Possibly the best one I've ever owned in terms of price/performance/VRAM/thermals/noise ratio. No coil whine! It's super quiet and runs cool! My GTX 1060 would run right up to its thermal limit, and this was fine since it had a tame BIOS/fan curve for noise reasons, but it made me think even with a much larger cooling solution that a card producing three times the heat from wattage would either run just as warm or be loud. Minecraft even works great which was a concern of mine (traditionally, AMD didn't do as well as nVidia here). If I had to nitpick, the support bracket can be a bit difficult for me to install in my case, but that's minor. And it would be less of an issue if I wasn't having to remove the ard as much as I am. Really surprised me with how great just about everything about it is... other than this issue.

----------

As an update, there's just been another crash after 10 to 15 minutes of Minecraft, so I can rule out the PCI Express Auto vs Gen 3 setting or my case fans as being a variable. Which is good, because my hard drives felt like they were getting warm, so I can connect those back up.

I also had GPU-Z running at the time of the issue but none of the values seem too crazy to me, but I'll attach a log of that to this post in case I'm mistaken.

Upon the POST attempt following the crash, I noticed the process stayed at the part where the White VGA motherboard LED is lit and the screen was all black with the little underscore in the corner (I believe this means video card still in text mode initializing the BIOS). So after this occurs, it seems to take a moment longer than normal to get back up and going on the video card side.

Again though, no device uninitializing itself, no drivers uninstalled, the fans don't even spin up to loud like many users report, which makes me think the drivers themselves are not misbehaving and this is just some GPU fault on the hardware level, or a power issue somewhere (likely also with the GPU).

I'm reaching out to Sapphire now. I believe I've done way more than would be expected to rule out basically everything else I can with my system, despite the behavior showing up with the GPU and pointing that way to begin with. My plan right now is...

1. Reach out to Sapphire. If this fails to resolve it...

2. Buy another PSU. If this fails to resolve it...

3. RMA the RAM? If it has warranty that is, if it does not and/or this fails to resolve it...

4. RMA the motherboard.

If the issue remains after this, my hands are in the air because I'll have ruled out everything. Everything. It would have to be the drivers at that point, or some consistent issue with my combination of hardware that only shows up with this graphics card. But I don't think so given how widespread this issue is across other combinations.

It's absolutely not worth buying any more AM4/DDR4 stuff when I've already re-bought a motherboard/CPU for this platform and plan to go to AM5 next generation, so if the RAM is out of warranty I have to skip that. And I won't be happy having to buy new platform stuff while still having the issue, on a platform that worked fine before the GPU came into the picture. And if the GPU gets a pass by Sapphire or they send another that behaves the same, and all the rest proves unsuccessful, I won't have a clue. But maybe I'm getting ahead of myself. I still have four parts to rule out. It's not panic time... yet.

Does this sound like a logical priority/order of steps at this point?

nurgle · Nov 22, 2023

By way of update, the RMA'd 7800XT failed the retailers test bench under OCCT/ Furmark load (system crashed) and a replacement is being sent.

Princess Garnet · Nov 22, 2023

Thank you so much for the update! I really hope it works out for you. I've heard mixed things on people saying an RMA resolved it and others saying it did not. I feel timid it will help me simply because I've noticed correlation with my platform/RAM settings (despite said platform/RAM configuration being stable before the new GPU was added).

If my last remaining attempts prove fruitless, I'll be finally embracing an RMA attempt myself.

Since my last post, I've done a couple more things.

1. I tried the drivers without Adrenalin. No impact, so Adrenalin itself is not causing me the issues. One more thing I can rule out.

To focus more testing on the platform/RAM side...

2. I removed half of my RAM, and left two DIMMs in A2 and B2. To my surprise, they booted at XMP (on my prior motherboard, XMP only seemed to work with all four DIMMs present). I tested, and got a rather quick crash.

3. I therefore swapped to other two DIMMs in, and also used the other two RAM slots of A1 and B1 (I know they're not what you'd want to actually use, but it's for testing). Interestingly, now it was failing to POST as easily. This makes sense when you realize they are the more finicky RAM slots... until you remember it POSTs just fine with all four present. No idea why, but it did eventually POST, but at a weird "hybrid" configuration. Basically, it was POSTing with the XMP voltages on RAM and SOC but using the 2,133 MHz speeds and timings. I'm wondering, since JEDEC speeds were less stable, but they also use less voltages, was that perhaps why? Maybe this could tell me that. So I decided to test it just shortly, and didn't get any fast crashes. But since this seemed like it might introduce more variables I did move the DIMMs back to A2 and B2 like before.

It's been a few days, and I've seemingly (key words) been stable since then. But I really need weeks, maybe some months, of stability under more conditions than I've been able to try in a mere few days to be sure there's a stability increase, but it was crashing a lot before this and now it's back to at least a few days so far stable. So either it's just inconsistent in its occurrence rate and I'll make a follow up saying "never mind, false hope, it crashed again" (I'm expecting this to be honest) or maybe there's something to this coming from my platform/RAM.

It's worth mentioning I did two other things as well, though, so these could also be variables that finally helped.

1. Disabled "fast boot" in the BIOS. (not to be confused with Windows "fast startup" which is already disabled.)

2. I uninstalled MSI Afterburner.

I usually like to test things one at a time but my troubleshooting is really already months in and I need to speed this up or it'll be half a year before I send word to Sapphire and say"I've had issues with it since I got it for half a year and I'm only now reaching out". That would seem strange

If I have any crashes with these, I can rule out all three of those (bad RAM, MSI Afterburner, or fast boot), and at that point I will truly tried everything I can possibly think of, and then some, besides disabling SAM (torn on if I should try this too?) and will finally be proceeding with RMA.

Princess Garnet · Nov 24, 2023

False hope, as expected. I finally had another crash, so I can also rule out the above three variables. Add them to the pile of seemingly endless things I've rules out at this point.

It almost has the be the graphics card (or drivers). I'm now reaching out to Sapphire so I'll see if this road leads to a resolution.

ratirt · Nov 24, 2023

can you check the event viewer after the restart? Maybe it will tell you more less where to look. Reinstall the graphics driver completely
it might be the PSU though. I had very similar stuff with my Vega 64 when it died. PSU was the problem.

Princess Garnet · Nov 24, 2023

ratirt said:
can you check the event viewer after the restart? Maybe it will tell you more less where to look. Reinstall the graphics driver completely
it might be the PSU though. I had very similar stuff with my Vega 64 when it died. PSU was the problem.

Thank you for the reply.

I'm well past both of these ideas. I've reinstalled graphics drivers and even the OS a number of times.

Event Viewer only shows Event ID 18, and I list this in the first post (as well as the WHEA and WatchDog logs I'm getting). There's also Event ID 41/6008 but those are cascading side effects from an unexpected Windows shutdown and not the cause, so they're not too relevant here. Event ID 18 itself is a WHEA log for a fatal hardware error having occurred when there's a machine check exception.

I'm honestly not suspecting the PSU too highly yet. When it crashes in League of Legends, or crashes with Minecraft (or other lighter games) windowed and paused while viewing a browser, that doesn't come off as lack of wattage. Likewise, when this Black screen to restart occurs, the system never even powers off. Screen goes Black, sound continues, then sound is stop and go, then it just restarts. Power, fans, everything stays on. They don't speed up, they don't cut out. Doesn't seem like OPP or OCP or anything is being tripped. It could still be a faulty PSU, but it's a one year old (return from RMA) EVGA SuperNova 750 G5, so it's not my first suspect.

Some things are pointing to either the GPU or drivers (issue disappears with a different GPU, and Watch Dog logs all indicate video related things), some things pointing to platform like CPU/RAM settings or voltages (issue is more likely to occur with XMP off which has lesser voltages on some CPU and RAM stuff, and also occurs more frequently if CPU is undervolted). So there's signs pointing two ways, but I tried testing the latter and came up empty. Issue goes away when the new GPU is taken from the equation. Searching the web has no end of people with 7800 XT, or many years ago on 5700 XTs, with this same "Back/Green screen of death" with the exact same symptoms as me. I've ruled out a faulty CPU (I tried two) and RAM tests as good and issue remains even using any combination of half the RAM. So if it's CPU/RAM related then in my mind it's in the voltages and not the parts themselves being faulty, and that stuff should "just work" at BIOS defaults. I shouldn't have to spend guesses on playing with voltages to get it working. I'm probably over 50 hours into researching and troubleshooting this and there's no end in sight.

ratirt · Nov 24, 2023

It would seem the problem is the GPU. You can try it in a different computer if you have a chance. If it works fine that might exclude the possibility the GPU is faulty and point to the PSU.
If it crashed at least you will have somewhat a clear answer.
Or maybe, if someone has a GPU like yours or similar. You could try it in your system and see if there's crashes still.

Lost_Troll · Nov 24, 2023

I be willing to bet that your power supply is dropping out on the 12V rail under load. I would install Hardware 64 and run it in logging mode and run the PC until it reboots and then see what the log shows with your power supply.

One thing I did notice from Sapphire, it seems that they have under rated the minim power supply requirements VS the other manufactures, like AS Rock. They recommend an 800W for their minim.

nurgle · Nov 24, 2023

So far, my new 7800XT post RMA is entirely stable. interestingly Cinebench 24 GPU score has also gone up by 1500 points (vs the pre crash score of the prev card).

Princess Garnet · Nov 24, 2023

ratirt said:
It would seem the problem is the GPU. You can try it in a different computer if you have a chance. If it works fine that might exclude the possibility the GPU is faulty and point to the PSU.
If it crashed at least you will have somewhat a clear answer.
Or maybe, if someone has a GPU like yours or similar. You could try it in your system and see if there's crashes still.

Unfortunately, I don't have another practical PC to try this in, nor do I have access to another similar GPU to try in mine. My old GTX 1060 works in this PC but that's the closest I can get.

Before RMAing other parts, or blindly buying new parts at a cost, it makes sense to try service on the part change that introduced the problem. I've already gone above and beyond trying to rule out other things, even to the point it cost me my chance at returning this for a refund (which, sadly, I'd probably be doing if I could because the rash of others online having this same issue with the 7800 XT and no known solutions despite the endless lists of things they've tried like me, which has me worried).

I also, and for the first time, had it happen under complete idle. I tried disabling Resizeable BAR/SAM, and I underclocked the GPU, setting a minimum frequency that equaled the "base clock" of around 1,200 MHz, and a maximum frequency that equaled the "game clock" (not boost clock) or around 2,100 MHz. In other words, it was set to less than 80% of default maximum. Yet another number of things on the pile I can say I've tried and doesn't resolve it. I don't know if one of those two aforementioned changes made it happen now at idle, or if that was merely coincidental and it's just a worsening issue.

Also, I'll attach my entire "LiveKernelReports" folder if anyone wants to analyze the logs and see if they show a clear sign of where this is all pointing, but it's just more/fuller samples of what I provided in text form in my first post, and to me it's just "124" WHEA logs and "117, 141, 1a8, and 1b8" watch dog logs, all of which seem to be video related. Some crash restarts result in a WHEA log and no watch dog logs produce, and some have a WHEA log and one or even two watch dog logs produced.

Lost_Troll said:
I be willing to bet that your power supply is dropping out on the 12V rail under load. I would install Hardware 64 and run it in logging mode and run the PC until it reboots and then see what the log shows with your power supply.

One thing I did notice from Sapphire, it seems that they have under rated the minim power supply requirements VS the other manufactures, like AS Rock. They recommend an 800W for their minim.

I'll try this.

The issues people are infamously having with the 7800 XT behaving this way don't seem tied to one brand though. I'm seeing Sapphire, Power Color, XFX, and probably all of them turning up. Underclocking, which should cut power use, isn't helping either. Other people have swapped to 850W or even 1KW+ PSUs and the issue remains too.

nurgle said:
So far, my new 7800XT post RMA is entirely stable. interestingly Cinebench 24 GPU score has also gone up by 1500 points (vs the pre crash score of the prev card).

Very good news! That was... quite a fast turnaround in your case? I've heard people saying it takes weeks.

I've found mine will go days without issues but then it just shows up, so hopefully your issue is actually gone.

Regardless of my issue, I hope this fixes yours!

System Name	Windows
Processor	13900K
Motherboard	Pro Z790-A WiFi
Cooling	Arctic LF III 360
Memory	32GB 6600 CL32
Video Card(s)	RTX 4090
Display(s)	MSI MAG401QR
Case	Phanteks P600s
Power Supply	Vertex GX-1000
Software	Win 11 Pro
Benchmark Scores	They suck.

System Name	DLSS / YOLO-PC
Processor	i5-12400F / 10600KF
Motherboard	Gigabyte B760M DS3H / Z490 Vision D
Cooling	Laminar RM1 / Gammaxx 400
Memory	32 GB DDR4-3200 / 16 GB DDR4-3333
Video Card(s)	RX 6700 XT / RX 480 8 GB
Storage	A couple SSDs, m.2 NVMe included / 240 GB CX1 + 1 TB WD HDD
Display(s)	Compit HA2704 / Viewsonic VX3276-MHD-2
Case	Matrexx 55 / Junkyard special
Audio Device(s)	Want loud, use headphones. Want quiet, use satellites.
Power Supply	Thermaltake 1000 W / FSP Epsilon 700 W / Corsair CX650M [backup]
Mouse	Don't disturb, cheese eating in progress...
Keyboard	Makes some noise. Probably onto something.
VR HMD	I live in real reality and don't need a virtual one.
Software	Windows 10 and 11

System Name	Not a thread ripper but pretty good.
Processor	Ryzen 9 5950x
Motherboard	ASRock X570 Taichi (revision 1.06, BIOS/UEFI version P5.50)
Cooling	EK-Quantum Velocity, EK-Quantum Reflection PC-O11, EK-CoolStream PE 360, XSPC TX360
Memory	Micron DDR4-3200 ECC Unbuffered Memory (4 sticks, 128GB, 18ASF4G72AZ-3G2F1)
Video Card(s)	XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage	Samsung 2TB 980 PRO 2TB Gen4x4 NVMe, 2 x Samsung 2TB 970 EVO Plus Gen3x4 NVMe, AMD Radeon RAMDisk
Display(s)	2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case	Lian Li PC-O11 Dynamic Black (original model)
Power Supply	Corsair RM750x
Mouse	Logitech M575
Keyboard	Corsair Strafe RGB MK.2
Software	Windows 10 Professional (64bit)
Benchmark Scores	Typical for non-overclocked CPU.

System Name	Teh Beast
Processor	Intel i9 14900K \| 7800X3d
Motherboard	Asus STRIX Z790E-E \| Strix B650E-F
Cooling	NZXT Kraken Elite 280 \| Kraken Elite 280
Memory	64GB G.Skill T5 6400Mhz \| 32GB G.skill 6000mhz
Video Card(s)	Sapphire 7900XTX Pulse \| 4070 Super
Storage	1X 1TB SN850X - 1 X 4TB SN850X - 2 X 2TB 980 Pro
Display(s)	LG 38" 38GL950G + LG 27" 27GP83B-B
Case	Lian Li o11 Air Mini
Power Supply	Corsair RM1000x
Software	WIndows 11 Pro

System Name	Not a thread ripper but pretty good.
Processor	Ryzen 9 5950x
Motherboard	ASRock X570 Taichi (revision 1.06, BIOS/UEFI version P5.50)
Cooling	EK-Quantum Velocity, EK-Quantum Reflection PC-O11, EK-CoolStream PE 360, XSPC TX360
Memory	Micron DDR4-3200 ECC Unbuffered Memory (4 sticks, 128GB, 18ASF4G72AZ-3G2F1)
Video Card(s)	XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage	Samsung 2TB 980 PRO 2TB Gen4x4 NVMe, 2 x Samsung 2TB 970 EVO Plus Gen3x4 NVMe, AMD Radeon RAMDisk
Display(s)	2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case	Lian Li PC-O11 Dynamic Black (original model)
Power Supply	Corsair RM750x
Mouse	Logitech M575
Keyboard	Corsair Strafe RGB MK.2
Software	Windows 10 Professional (64bit)
Benchmark Scores	Typical for non-overclocked CPU.

Black screens leading to restarts (Event ID 18) on AMD platform since changing graphics card

Princess Garnet

Super Firm Tofu

Princess Garnet

Beginner Macro Device

A Computer Guy

Psychoholic

Princess Garnet

A Computer Guy

Princess Garnet

F4-3600C16Q-64GVKC - Overview - G.SKILL International Enterprise Co., Ltd.

A Computer Guy

F4-3600C16Q-64GVKC - Overview - G.SKILL International Enterprise Co., Ltd.

kapone32

Klemc

Princess Garnet

nurgle

A Computer Guy

Princess Garnet

Attachments

nurgle

Princess Garnet

Princess Garnet

ratirt

Princess Garnet

ratirt

Lost_Troll

nurgle

Princess Garnet

Attachments

System Name	Best AMD Computer
Processor	AMD 7900X3D
Motherboard	Asus X670E E Strix
Cooling	In Win SR36
Memory	GSKILL DDR5 32GB 5200 30
Video Card(s)	Sapphire Pulse 7900XT (Watercooled)
Storage	Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s)	GIGABYTE FV43U
Case	Corsair 7000D Airflow
Audio Device(s)	Corsair Void Pro, Logitch Z523 5.1
Power Supply	Deepcool 1000M
Mouse	Logitech g7 gaming mouse
Keyboard	Logitech G510
Software	Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores	Firestrike: 46183 Time Spy: 25121

System Name	KLM
Processor	7800X3D
Motherboard	B-650E-E Strix
Cooling	Arctic Cooling III 280
Memory	16x2 Fury Renegade 6000-32
Video Card(s)	4070-ti PNY
Storage	512+512+1+2+2+2+2+6+500+256+4+4+4
Display(s)	VA 32" 4K@60 - OLED 27" 2K@240
Case	4000D Airflow
Audio Device(s)	Edifier 1280Ts
Power Supply	Shift 1000
Mouse	502 Hero
Keyboard	K68
Software	EMDB
Benchmark Scores	0>1000

Processor	Ryzen 5800X3D @stock
Motherboard	ASUS ROG B450 F
Cooling	Corsair A500 (for £35 UK :)
Memory	32GB 3200 Corsair
Video Card(s)	7800XT
Storage	3x NVME, 1x SATA HDD, 1x HDD
Display(s)	AGON AG271QX
Case	Corsair 600T
Power Supply	EVGA supernova 750 G2
Mouse	Logitech G402
Keyboard	Steelseries 6G
Software	Win 11

System Name	Bro2
Processor	Ryzen 5800X
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Corsair h115i pro rgb
Memory	16GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s)	Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage	M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s)	LG 27UD69 UHD / LG 27GN950
Case	Fractal Design G
Audio Device(s)	Realtec 5.1
Power Supply	Seasonic 750W GOLD
Mouse	Logitech G402
Keyboard	Logitech slim
Software	Windows 10 64 bit

System Name	Spam
Processor	i9-12900K PL1=125 TA=56 PL2=250
Motherboard	MSI MAG B660M Mortar WiFi DDR4
Cooling	Scythe Kaze Flex 120mm ARGB Fans x5 / Noctua NH-U9S
Memory	Mushkin Red Line DDR4 4000 16Gb x2 18-22-22-42 1T
Video Card(s)	Sapphire Pulse RX 7900 XT
Storage	Team Group MP33 512Mb / 1Tb
Display(s)	LG 34GP63A-B (3440 x 1440)
Case	BitFenix Prodigy M 2022
Audio Device(s)	Real Tek on Board Audio
Power Supply	EVGA SuperNOVA 850 GM
Mouse	G203
Keyboard	G413
Software	WIN 11 Pro