• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Black screens leading to restarts (Event ID 18) on AMD platform since changing graphics card

Ioannis97

New Member
Joined
Nov 25, 2023
Messages
2 (0.01/day)
I can also confirm, that I've had the exact same thing happen on my end (Sapphire model with 2 fans).

I first noticed it when I played TLOU, where the gpu usage dropped, and the PC simply went into a black screen, followed by a reboot.

I then got the same issue again, and again (randomly). Sometimes, Furmark will trigger the issue consistently, sometimes it would run just fine.

The benchmark scores are also a little slower compared to the rest of the 7800XT's. Temperatures were fine as well.

I upgraded my Corsair RM650 to an RMX850, and the issue persisted. Although, coil whine got better. Clean windows reinstall, changed profiles.

The issue is so bizarre and inconsistent.

I've sadly returned the card, and most likely hoping for a refund. I had an AMD card 10 years ago, which was a nightmare. I kept the card 2 days. Sadly, seems like something's up again, or I'm just unlucky. :(
 
Joined
Oct 15, 2011
Messages
2,014 (0.44/day)
Location
Springfield, Vermont
System Name KHR-1
Processor Ryzen 9 5900X
Motherboard ASRock B550 PG Velocita (UEFI-BIOS P3.40)
Memory 32 GB G.Skill RipJawsV F4-3200C16D-32GVR
Video Card(s) Sapphire Nitro+ Radeon RX 6750 XT
Storage Western Digital Black SN850 1 TB NVMe SSD
Display(s) Alienware AW3423DWF OLED-ASRock PG27Q15R2A (backup)
Case Corsair 275R
Audio Device(s) Technics SA-EX140 receiver with Polk VT60 speakers
Power Supply eVGA Supernova G3 750W
Mouse Logitech G Pro (Hero)
Software Windows 11 Pro x64 23H2
"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
This can happen due to unstable VRAM OC. Especially on Ryzen for some reason.

Also more likely to occur with heat on in the room. I started to wonder if the VRAM is getting flaky at merely >55C.

where the gpu usage dropped, and the PC simply went into a black screen, followed by a reboot.
I never recalled seeing this, but, Windows would black screen-reboot out-of-the-blue, randomly. Sometimes, Windows reboots itself with that error right after I double-click to start Halo MCC. (IIRC)
It may also randomly reboot with that error when browsing. More likely to occur when my room was warm.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
I be willing to bet that your power supply is dropping out on the 12V rail under load. I would install Hardware 64 and run it in logging mode and run the PC until it reboots and then see what the log shows with your power supply.

One thing I did notice from Sapphire, it seems that they have under rated the minim power supply requirements VS the other manufactures, like AS Rock. They recommend an 800W for their minim.
So it crashed while that was running. This occured after 5 to 10 minutes after launching Minecraft, and then pasuing/windowing it and being another five to ten minutes in a browser.

The file it logged is attached.
I can also confirm, that I've had the exact same thing happen on my end (Sapphire model with 2 fans).

I first noticed it when I played TLOU, where the gpu usage dropped, and the PC simply went into a black screen, followed by a reboot.

I then got the same issue again, and again (randomly). Sometimes, Furmark will trigger the issue consistently, sometimes it would run just fine.

The benchmark scores are also a little slower compared to the rest of the 7800XT's. Temperatures were fine as well.

I upgraded my Corsair RM650 to an RMX850, and the issue persisted. Although, coil whine got better. Clean windows reinstall, changed profiles.

The issue is so bizarre and inconsistent.

I've sadly returned the card, and most likely hoping for a refund. I had an AMD card 10 years ago, which was a nightmare. I kept the card 2 days. Sadly, seems like something's up again, or I'm just unlucky. :(
You and many of others like me, it seems.

I've lost track of the number of Black/Green screen of death issues being reported on various forums/Reddit communities over the past couple of months specifically about the 7800 XT and they all mirror my own. Black screen to reboot during GPU load. People trying random registry settings like ULPS and MPO, uninstalling/reinstalling different drivers, DDU, AMD Cleanup utility, updating the BIOS, updating drivers, installing the video drivers without Adrenalin, underclocking the card 20%+, even reinstalling Windows (!) and with a minimal software environment, trying with XMP on or off, changing CPUs, changing PSUs (750W, 850W, 1KW+, doesn't matter), changing this voltage, changing that voltage, disabling this BIOS setting (like c-states, PBO, you name it), changing Windows power plan settings, changing DP/HDMI cables and where they plugged in, and all of this, all of this... all to get a system stable that was previously stable until the 7800 XT was added. That's how almost everysingle one of these seems to go. Then these people put an RTX 3080 Ti or RTX 4070 (or their old Pascal or older AMD) in the same system and the Black/Green screen reboots stop.

The constant in this scenario seems pretty apparent to me. Not everyone has this experience with it (in fact most hopefully don't) but my goodness is it frustrating it you do because nothing else seems to fix it? RMA seems to be half and half from comments I find online but it's early so let's hope after more time passes/more RMAs are done we know more. I'm going to try and RMA mine because I've tried everything else.
 

Attachments

  • HwInfo64.zip
    220.3 KB · Views: 34
Joined
Aug 15, 2022
Messages
277 (0.44/day)
Location
Some Where On Earth
System Name Spam
Processor i9-12900K PL1=125 TA=56 PL2=250
Motherboard MSI MAG B660M Mortar WiFi DDR4
Cooling Scythe Kaze Flex 120mm ARGB Fans x5 / Noctua NH-U9S
Memory Mushkin Red Line DDR4 4000 16Gb x2 18-22-22-42 1T
Video Card(s) Sapphire Pulse RX 7900 XT
Storage Team Group MP33 512Mb / 1Tb
Display(s) LG 34GP63A-B (3440 x 1440)
Case BitFenix Prodigy M 2022
Audio Device(s) Real Tek on Board Audio
Power Supply EVGA SuperNOVA 850 GM
Mouse G203
Keyboard G413
Software WIN 11 Pro
RMA seems to be half and half from comments I find online but it's early so let's hope after more time passes/more RMAs are done we know more. I'm going to try and RMA mine because I've tried everything else.
After looking at the log from HW64, I would RMA the card back. All of your voltage rails are fine and so are all of your system temps as well.
 

Ioannis97

New Member
Joined
Nov 25, 2023
Messages
2 (0.01/day)
So it crashed while that was running. This occured after 5 to 10 minutes after launching Minecraft, and then pasuing/windowing it and being another five to ten minutes in a browser.

The file it logged is attached.

You and many of others like me, it seems.

I've lost track of the number of Black/Green screen of death issues being reported on various forums/Reddit communities over the past couple of months specifically about the 7800 XT and they all mirror my own. Black screen to reboot during GPU load. People trying random registry settings like ULPS and MPO, uninstalling/reinstalling different drivers, DDU, AMD Cleanup utility, updating the BIOS, updating drivers, installing the video drivers without Adrenalin, underclocking the card 20%+, even reinstalling Windows (!) and with a minimal software environment, trying with XMP on or off, changing CPUs, changing PSUs (750W, 850W, 1KW+, doesn't matter), changing this voltage, changing that voltage, disabling this BIOS setting (like c-states, PBO, you name it), changing Windows power plan settings, changing DP/HDMI cables and where they plugged in, and all of this, all of this... all to get a system stable that was previously stable until the 7800 XT was added. That's how almost everysingle one of these seems to go. Then these people put an RTX 3080 Ti or RTX 4070 (or their old Pascal or older AMD) in the same system and the Black/Green screen reboots stop.

The constant in this scenario seems pretty apparent to me. Not everyone has this experience with it (in fact most hopefully don't) but my goodness is it frustrating it you do because nothing else seems to fix it? RMA seems to be half and half from comments I find online but it's early so let's hope after more time passes/more RMAs are done we know more. I'm going to try and RMA mine because I've tried everything else.
Let us know how it goes! Best of luck. :clap:
 

reze0

New Member
Joined
Dec 13, 2023
Messages
2 (0.01/day)
Hi, I'm very happy to find this article about this. I got 7800 xt like 2 months ago and it was working fine until yesterday I updated my BIOS and undervolt my CPU. I did benchmarkings and everything worked fine until today I started to have random game crashes and PC just randomly restarted. So far I have tried everything to rule things out for example: Disable undervolting, putting CPU boost to enabled from advanced (that disables the undervolt option also). Trying older BIOS version, tried to disable Smar Access Memory. Tried updating every driver possible and temperatures are perfect and everything.

Basically I had very old BIOS in my B550A-Pro and I decided to update it and that unlocked me to undervolt my CPU with easy tutorial and it worked for one day with very good stats. Now my only option is to download back the 2021 BIOS and try with that. The best way to test if my system crashes is to play GTA ONLINE for 10 minutes and then it happens. If that doesn't work I'm gonna swap my 7800XT to my old 1080 TI And if it still happens then it's either my RAM or PSU. But yeah everything was working fine for 2 months and suddenly this happens. Like two years ago when I got my new CPU Ryzen 7 5800x, I started to have random fps drops from 600fps to 30fps and back then I fixed the problem by disabling fTPM. AMD seems to have alot of these kind of issues. I will also give you an update if I get it fixed
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
If your issues started when you updated the BIOS and undervolted, then I would start there. You might have a stable graphics card, but an unstable CPU undervolt.

Mine worked fine for a handful of days too, which isn't long, and then crashed quickly twice when I tried undervolting it. I set it back to stock but the issues remained (but not as commonly).

I think the 7800 XT (and maybe not only the 7800 XT) might be more sensitive to system side instabilities that would seem stable on other graphics cards? That's one thought I've had. My system is stable on my GTX 1060 with either XMP on or off (haven't tried undervolting the CPU though), but on the 7800 XT, those factors make the issue better or worse (but it's unstable in all cases). So you saying the issue showed up after months when trying to undervolt doesn't shock me based on my own experience.

My own 7800 XT is still out for RMA so I have nothing to add to my own situation for now, but it is on its way back to me and should arrive early next week. Then add some days (or really number of weeks) for me to get enough time in to test and ensure the issue may be gone. Sapphire didn't divulge any details on what they did/found but they had it roughly a week (they state a week or two up front is typical so I found this pretty quick) and the follow up email states a "replacement" has been sent out. The initial e-mail uses the terms "repair" and "replacement" so I'm reading this as though I'm not getting the same one back. But nothing more was stated, and even if it was, I won't know what the result is until I've had time to try it so it's irrelevant to me whether it was repaired or replaced (if it was replaced, hopefully this one runs as cool, quiet, and with a lack of coil whine as the original because that one was Golden).

Since nurgle hasn't posted back to say otherwise, I presume their RMA succeeded in resolving the same issue for them. Hoping mine does. If not I'll move to trying a different PSU.

But in your case I'd definitely be trying stock settings and see if that makes the stability return. Being stable for two months and then going unstable the day after you undervolt seems suggestive your GPU is fine.
 

reze0

New Member
Joined
Dec 13, 2023
Messages
2 (0.01/day)
If your issues started when you updated the BIOS and undervolted, then I would start there. You might have a stable graphics card, but an unstable CPU undervolt.

Mine worked fine for a handful of days too, which isn't long, and then crashed quickly twice when I tried undervolting it. I set it back to stock but the issues remained (but not as commonly).

I think the 7800 XT (and maybe not only the 7800 XT) might be more sensitive to system side instabilities that would seem stable on other graphics cards? That's one thought I've had. My system is stable on my GTX 1060 with either XMP on or off (haven't tried undervolting the CPU though), but on the 7800 XT, those factors make the issue better or worse (but it's unstable in all cases). So you saying the issue showed up after months when trying to undervolt doesn't shock me based on my own experience.

My own 7800 XT is still out for RMA so I have nothing to add to my own situation for now, but it is on its way back to me and should arrive early next week. Then add some days (or really number of weeks) for me to get enough time in to test and ensure the issue may be gone. Sapphire didn't divulge any details on what they did/found but they had it roughly a week (they state a week or two up front is typical so I found this pretty quick) and the follow up email states a "replacement" has been sent out. The initial e-mail uses the terms "repair" and "replacement" so I'm reading this as though I'm not getting the same one back. But nothing more was stated, and even if it was, I won't know what the result is until I've had time to try it so it's irrelevant to me whether it was repaired or replaced (if it was replaced, hopefully this one runs as cool, quiet, and with a lack of coil whine as the original because that one was Golden).

Since nurgle hasn't posted back to say otherwise, I presume their RMA succeeded in resolving the same issue for them. Hoping mine does. If not I'll move to trying a different PSU.

But in your case I'd definitely be trying stock settings and see if that makes the stability return. Being stable for two months and then going unstable the day after you undervolt seems suggestive your GPU is fine.
Thank your the informative answer. So my new GPU 7800xt has been working for two months and pc overall. But I undervolted my CPU this wednesday and it worken perfectly. Then thurday my pc started restaring randomly.

Me and my friend tried everything and swapped BIOS and everything to solve the problem. Well then I found your text about the same problem (random restart). And yesterday we did cmd: sfc /scannow and after that I resetted my newest BIOS and it worked perfectly yesterday, no crashes. So before shutting down my pc before I went to sleep, I undervoltted cpu with advanced mode "Negative 30". Then I did streamed to discord while I was playing GTA online and cs2 open at the same time and also spotify. 10 minute test and no crash. And temperatures were amazing. Gpu temperature was maximum 47 and cpu was like 70. So my guess is that my windows was corrupted after the settings and deleting stuff, swapping bios to older and new again while undervolt was set.

I will keep testing and yeah I guess the problem was either corrupted windows or undervolt, but no crash yesterday!

Hopefully your PC problem solves and from my experience if it really was corruoted windows+bios swapping and undervoltting, the command fixed it all. So basically now I've ruled out the GPU, CPU, (MAYBE PSU) because it's high quality and brand new and is working fine for the past two months that I got it. So now if my pc starts crashing, it's because of the undervolt, but yesterday's test shows it's working and I will keep testing. If it crashes, then I will only change to 30 to 25, then 20, then 15 if it isn't stable. But yeah, again I learned alot for future: how to check how to find the problem and how to rule out different parts.
 
Joined
Dec 20, 2020
Messages
21 (0.02/day)
Processor Ryzen 5800X3D @stock
Motherboard ASUS ROG B450 F
Cooling Corsair A500 (for £35 UK :)
Memory 32GB 3200 Corsair
Video Card(s) 7800XT
Storage 3x NVME, 1x SATA HDD, 1x HDD
Display(s) AGON AG271QX
Case Corsair 600T
Power Supply EVGA supernova 750 G2
Mouse Logitech G402
Keyboard Steelseries 6G
Software Win 11
If your issues started when you updated the BIOS and undervolted, then I would start there. You might have a stable graphics card, but an unstable CPU undervolt.

Mine worked fine for a handful of days too, which isn't long, and then crashed quickly twice when I tried undervolting it. I set it back to stock but the issues remained (but not as commonly).

I think the 7800 XT (and maybe not only the 7800 XT) might be more sensitive to system side instabilities that would seem stable on other graphics cards? That's one thought I've had. My system is stable on my GTX 1060 with either XMP on or off (haven't tried undervolting the CPU though), but on the 7800 XT, those factors make the issue better or worse (but it's unstable in all cases). So you saying the issue showed up after months when trying to undervolt doesn't shock me based on my own experience.

My own 7800 XT is still out for RMA so I have nothing to add to my own situation for now, but it is on its way back to me and should arrive early next week. Then add some days (or really number of weeks) for me to get enough time in to test and ensure the issue may be gone. Sapphire didn't divulge any details on what they did/found but they had it roughly a week (they state a week or two up front is typical so I found this pretty quick) and the follow up email states a "replacement" has been sent out. The initial e-mail uses the terms "repair" and "replacement" so I'm reading this as though I'm not getting the same one back. But nothing more was stated, and even if it was, I won't know what the result is until I've had time to try it so it's irrelevant to me whether it was repaired or replaced (if it was replaced, hopefully this one runs as cool, quiet, and with a lack of coil whine as the original because that one was Golden).

Since nurgle hasn't posted back to say otherwise, I presume their RMA succeeded in resolving the same issue for them. Hoping mine does. If not I'll move to trying a different PSU.

But in your case I'd definitely be trying stock settings and see if that makes the stability return. Being stable for two months and then going unstable the day after you undervolt seems suggestive your GPU is fine.
So far, my replacement 7800XT is fully stable and working well :)
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
So far, my replacement 7800XT is fully stable and working well :)
Wonderful, and thank you again for the follow up. Keep enjoying it; it's fantastic when the issue isn't there!
Hopefully your PC problem solves and from my experience if it really was corruoted windows+bios swapping and undervoltting, the command fixed it all. So basically now I've ruled out the GPU, CPU, (MAYBE PSU) because it's high quality and brand new and is working fine for the past two months that I got it. So now if my pc starts crashing, it's because of the undervolt, but yesterday's test shows it's working and I will keep testing. If it crashes, then I will only change to 30 to 25, then 20, then 15 if it isn't stable. But yeah, again I learned alot for future: how to check how to find the problem and how to rule out different parts.
If issues show back up, I would definitely undo the undervolt. Maybe even go back to the older BIOS if that's a solution you're open to? (I would understand if you don't want to do this one.)

I've read -30 is common for the 5800X3D to achieve (presuming this is the CPU you have?), but this doesn't mean all will. Mine seemed to crash sooner with it, but that was on the 7800 XT which was crashing even at stock.

Come to think of it then... maybe I should be trying to see how my system handles an undervolt now since I'm back on my GTX 1060 and can rule out the crashes that started with the 7800 XT, which may have been the GPU (partially or fully). This would give me a baseline for what my particular CPU could be capable of.

I've been hesitant to want to spend time on that stuff for now when I might have to spend time on it in the coming weeks if my RMA doesn't resolve it. Maybe if the 7800 XT resolves that issue I'll try again further down the line, or maybe I'll just leave it be. Lower temperatures would be nice, but they're still fine. And I might entertain switching it out for Zen 5 X3D later which might be a year from now (?) give or take, so... I might just leave well enough alone if the RMA replacement proves good.
 

cvargas343

New Member
Joined
Dec 15, 2023
Messages
2 (0.01/day)
I've been having the same issue. I recently updated my CPU and GPU to a Ryzen 5 5800x and RX 7900 XTX, with same B550 board and 3200 MHz Ram. It's weird because I got WHEA-Logger event 18 more frequently while on idle, only once while playing a game. It could literally have 30+ minutes full load in Cinebench, Prime95 or 3DMark and it will pass all tests without issues. I tried every suggestion from almost all forums I read with the same description, but it kept crashing.

At first I thought of a defective CPU due to the error description on the event log, but it never happened with my previous 5600X, and I had that one running for quite a while. So, I decided to try something. I uninstalled the GPU drivers with DDU and reinstalled the latest driver 23.12.1. I tried Full install again then rebooted. After reboot, I opened the Adrenalin software to reconfigure it again when suddenly the system crashed with WHEA Event 18, happening twice exactly when trying to use Adrenalin. Because of that, I uninstalled drivers again, then reinstalled the latest driver, but this time using "Driver Only". I found that my system is more stable until now, with no crashes for the last 7
days on both, idle and full load.

I know this might not be the best solution as you lose control of the GPU features (you can still use MSI Afterburner to control clock speeds and power), but this has proven to be successful in my case.
 
Last edited:
Joined
Jan 1, 2012
Messages
115 (0.03/day)
Yeah, Event ID 18 is tricky and it basically means the CPU found a situation to sound the machine check exception alarm. It does not mean the CPU itself is bad; just that the CPU was what noticed it (machine check exceptions can be caught by either the CPU or RAM). Unfortunately, what causes it is usually not always easy to glean from the logs and it's usually a process of elimination. Mine showed up with the GPU change, and the Watch Dog logs did point to the GPU, so I started there. You changed CPU and GPU, so what I'd do is try with one or the other changed and see if it indicates anything. Like if new CPU and old GPU are fine and old CPU and new GPU are not fine, it points to the GPU.

It's not unusual to see this at idle either. I found others having the issue where it would happen at idle or near idle (browser only use). I never experienced that, but I had the issue at moderate to high load instead. I wonder if the voltage/frequency curve is borderline stable at different spots for different people/different combinations of hardware, and that might be what causes this?

And I tried with the drivers only. It didn't make a difference in my case (plus I would find that a poor solution even if it did work as, yes, I desperately need at least one feature of the drivers [OpenGL triple buffering] or I lose a ton of performance in one game). If removing Adrenalin worked for you, it makes me wonder if the default boost was unstable. This would indicate a borderline unstable GPU, but you could play with the frequencies/voltage in Adrenalin if you want. Again, I tried that and it didn't help me, but since removing Adrenalin helps for you, it might be worth trying if you want the Adrenalin features.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
I'm going to cautiously update the status on this as "seemingly resolved for now". If I don't turn around and have the issue later (and if I do, I'll come back and update that I have), then consider this resolved. What solved it? I got a replacement back from RMA and the issue hasn't shown up since. So my particular "Black screen to restart curse of the 7800 XT" was simply down to a bad individual hardware sample.

All that time and effort was spent up front to avoid sending back the part that changed before the behavior arrived (and troubleshooting 101 is to suspect that), all because I wanted to rule out everything else instead of jumping to presuming bad hardware. In any other situation I would have jumped at suspecting the new hardware, but all the other issues online reported with the 7800 XT made me think it wasn't an individual sample issue but a possible broader 7800 XT issue, and then the later testing with my platform was making the issue worse which had me thinking "maybe it's also my system and not the graphics card". Still no idea why it was more severe for me with XMP off but I'm not sure it matters?

What a wild goose chase!

I wanted to give it more time and try it under a lot more scenarios, but I've had some people apparently watching this thread and asking me for updates on it, so I figured I'd add on here. I feel like if I had my original sample it would have crashed half a dozen to two dozen times by now, so I think it's fair to say "most likely resolved with a replacement". Assume that resolved it unless I come back and say otherwise, so anyone else with a 7800 XT black screen crashing on a system that was stable beforehand watching/reading this for my conclusion, if you haven't done an RMA yet, do it and stop hoping something else (like drivers or chance) fixes it! Worst case scenario is you have a different cause than me and the RMA doesn't solve it but I think it absolutely needs ruled out first. I don't think it's resolved the issue for everyone but it seems to have the highest success rate. Three people in this thread reported the issue and two tried RMA and both had success. I'm hoping mine doesn't come back because it was a nightmare.
 
Joined
Oct 15, 2011
Messages
2,014 (0.44/day)
Location
Springfield, Vermont
System Name KHR-1
Processor Ryzen 9 5900X
Motherboard ASRock B550 PG Velocita (UEFI-BIOS P3.40)
Memory 32 GB G.Skill RipJawsV F4-3200C16D-32GVR
Video Card(s) Sapphire Nitro+ Radeon RX 6750 XT
Storage Western Digital Black SN850 1 TB NVMe SSD
Display(s) Alienware AW3423DWF OLED-ASRock PG27Q15R2A (backup)
Case Corsair 275R
Audio Device(s) Technics SA-EX140 receiver with Polk VT60 speakers
Power Supply eVGA Supernova G3 750W
Mouse Logitech G Pro (Hero)
Software Windows 11 Pro x64 23H2
I'm going to cautiously update the status on this as "seemingly resolved for now". If I don't turn around and have the issue later (and if I do, I'll come back and update that I have), then consider this resolved. What solved it? I got a replacement back from RMA and the issue hasn't shown up since. So my particular "Black screen to restart curse of the 7800 XT" was simply down to a bad individual hardware sample.
I suspect faulty VRAM on that returned video card. Could be a pandemic-related QC-slump! Just like those samples of LCD monitors I got with bad pixels, which are suspected of being related to the pandemic.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
Low quality of stuff in the last few years crossed my mind.

I also saw someone mention VRAM as a possible cause, maybe when crossing specific temperature thresholds.

Some people having the issue mentioned they'd get the crash once and then it'd be fine for the rest of the day. That wasn't my own experience (mine was random; sometimes fine for a few days and sometimes crashing a few times in the same day) but it had me wondering.

Your mention of monitors is coincidental. I had a short scare with my monitor yesterday. It turned on but I walked away and when I came back it was powered off. The power button refused to do anything. I unplugged it (power and display cable), pressed the power button a few times, plugged both cables back in and it turned right on after the power cable was connected (even skipping the few second display before showing a picture). It's a nearly 15 year old display so it's time is probably coming but I'm worried about that one next, especially if quality control is all over the place lately. I went from around a decade and a half of almost no hardware issues to over half of the major issues I've ever had all being in the last three years alone (the other big one was two different motherboards). That's quite a difference. If Dell still offered the 1600p 27" display they had available last year I might look towards getting that soon but they only have the 30" now and I think that's bigger than I want. So I might end up giving up 16:10 and getting higher refresh after all.
 
Joined
Dec 20, 2020
Messages
21 (0.02/day)
Processor Ryzen 5800X3D @stock
Motherboard ASUS ROG B450 F
Cooling Corsair A500 (for £35 UK :)
Memory 32GB 3200 Corsair
Video Card(s) 7800XT
Storage 3x NVME, 1x SATA HDD, 1x HDD
Display(s) AGON AG271QX
Case Corsair 600T
Power Supply EVGA supernova 750 G2
Mouse Logitech G402
Keyboard Steelseries 6G
Software Win 11
I'm going to cautiously update the status on this as "seemingly resolved for now". If I don't turn around and have the issue later (and if I do, I'll come back and update that I have), then consider this resolved. What solved it? I got a replacement back from RMA and the issue hasn't shown up since. So my particular "Black screen to restart curse of the 7800 XT" was simply down to a bad individual hardware sample.

All that time and effort was spent up front to avoid sending back the part that changed before the behavior arrived (and troubleshooting 101 is to suspect that), all because I wanted to rule out everything else instead of jumping to presuming bad hardware. In any other situation I would have jumped at suspecting the new hardware, but all the other issues online reported with the 7800 XT made me think it wasn't an individual sample issue but a possible broader 7800 XT issue, and then the later testing with my platform was making the issue worse which had me thinking "maybe it's also my system and not the graphics card". Still no idea why it was more severe for me with XMP off but I'm not sure it matters?

What a wild goose chase!

I wanted to give it more time and try it under a lot more scenarios, but I've had some people apparently watching this thread and asking me for updates on it, so I figured I'd add on here. I feel like if I had my original sample it would have crashed half a dozen to two dozen times by now, so I think it's fair to say "most likely resolved with a replacement". Assume that resolved it unless I come back and say otherwise, so anyone else with a 7800 XT black screen crashing on a system that was stable beforehand watching/reading this for my conclusion, if you haven't done an RMA yet, do it and stop hoping something else (like drivers or chance) fixes it! Worst case scenario is you have a different cause than me and the RMA doesn't solve it but I think it absolutely needs ruled out first. I don't think it's resolved the issue for everyone but it seems to have the highest success rate. Three people in this thread reported the issue and two tried RMA and both had success. I'm hoping mine doesn't come back because it was a nightmare.
a month in now post RMA and absolutely no issues whatsoever.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
I am bringing this back up because I unfortunately fear the issue may now be occurring on my replacement RX 7800 XT. For the first time since the replacement, I experienced the same issue last night. The only difference this time was in the minutes before the Black screen to restart issue occurred, I had a few (very far spaced apart) White "flash" frames so it was slightly less sudden. There were also no WHEA log this time, but the same Event ID 18 and Watch Dog log showed up.

=========================

VIDEO_ENGINE_TIMEOUT_DETECTED (141)
One of the display engines failed to respond in timely fashion.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
Arguments:
Arg1: ffff8f0fbbcf5010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8023caf8aa0, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, The secondary driver specific bucketing key.
Arg4: 000000000000248c, Optional internal context dependent data.


=========================

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 2

The details view of this entry contains further information.


=========================



In case it has relevance anyone can make anything of, I notice the same "0xbea" start and same "108" ending. The 1 in the middle differs and is sometimes 0 or 1 but the rest is always the same. This matches up the same ones I saw before (and the same ones I see with this behavior elsewhere online).

Something to note about this is that it occurred with a slight shift in my use pattern. So I'm wondering if the issue was ever truly resolved or if I avoided it until now. Specifically, I was playing in an older version of Minecraft and with different shaders than I've been playing regularly. I had no issues doing this the first time I played this older version (couple hours of play?) but during the recent/second time, it occurred fast. So I'm not sure if this is just something the older version of that game (at least with shaders) is doing that the drivers really don't like, or if the issue was still there on the replacement, but much less prone to occurring and I never tripped it until now.

I did initially have TDR issues and other performance issues on the replacement, and those were in my mind at first, but the former was limited to one game and only on drivers 23.12.1 (the latest of the time). Going back to 23.11.1 resolved that and the other performance issues, so I chalked those up to possible driver issues. I have since updated to 24.3.1. Adrenalin takes 15 seconds to open now (the first time only) and slower to change between pages and tabs. It seems more sluggish.

Supposedly, the "screen goes Black but sound continues until it restarts" seems to be suspected as VRAM issues. I'm not sure if this is (always) true.

Someone else here has the issue with an RX 7800 XT but also had it with a previous RTX 4080.

I'm a bit lost now. The behavior clearly seems linked to the GPU (or drivers) to some degree since it showed up with the new graphics card, and a mere sample change severely minimized it... but two bad ones? Seriously!? What kind of luck is this? If it's not bad luck, what else is wrong with my system that is causing this? Or what am I doing wrong?

Do I reach back out to Sapphire? Or is replacing my PSU, despite my doubts that it is the cause, be my next step? Then do the same with my motherboard and RAM (and thus CPU) since those are the only other two parts I haven't changed? This is becoming costly to imagine, and absurd, all to accommodate a simple GPU upgrade that could still be the source of the problems. Why am I having to consider changing my entire system to accommodate something? I already spent over $600 for the GPU ($560 plus tax) and another $70 to send it back to Sapphire insured. Now I need to do that again and/or buy other replacements? Uh...
 

Xaamoh

New Member
Joined
Apr 27, 2024
Messages
2 (0.15/day)
I had the same issue: black screen randomly over months - especially while gaming. Yesterday I figured out, it was my power-cable from GPU to power-supply. I replaced the cable with the stock I got with GPU. Since then no problems anymore.
 

cvargas343

New Member
Joined
Dec 15, 2023
Messages
2 (0.01/day)
I actually was able to get to a solution. I was playing Kingdom Come Deliverance and Laika: Aged through Blood like 2 months ago and all of the sudden the PC black screened and rebooted while doing Alt-Tab and going to Chrome, triggering WHEA event 18. As reported previously, these happened randomly while using Chrome or at idle, but the second time, the WiFi drivers stopped working, causing Windows to be really slow at startup. Now, before this I was testing my CPU with CoreCycler, using the y-cruncher test recommended for Zen 3 processors. It crashed while testing core 0 and 7 with PBO or stock settings, causing me to modify the Curve Optimizer settings to +10 on each core to be "stable".

Wanted to check if there was any physical damage to the WiFi module on the motherboard, so I decided to take my PC apart and disassemble the IO cover to check the module connection. The module was fine to my eyes, so I reconnected it and reassembled everything. I noticed some thermal paste that made it into some sections of the socket, so I cleaned it up and the CPU pins with Isopropyl alcohol.

The PC rebooted as normal, with Windows acting normal and recognizing the WiFi module again. Interestingly, my CPU was behaving differently. After hours of tests with y-cruncher, I found that the CPU was stable with PBO enabled +150 Mhz and CO all cores -25. I tuned PPT, EDC and TDC to reduce temperatures but keeping the same performance. For the first time, my PC was stable with y-cruncher. But once again, my PC black screened and rebooted while using Chrome and going through 3 tabs.

There is when I suspected that my GPU OC could be the guilty one. I have a Sapphire Nitro+ RX 7900 XTX, so I thought I could push this card a little further by overclocking it to 3000Mhz, PL +15 and undervolting it to 1010 mV. The VRAM was also overclocked to 2700Mhz. With these settings, the card was getting good scores in 3DMark and Unigine Superposition, but maybe it was causing these WHEA events all along, so I decided to keep the GPU clock, UV and Power Limits, but reducing the VRAM OC to 2650Mhz.

After doing so, I stopped experiencing these black screen issues. I have been enjoying a stable system for more than a month now, with no more WHEA events. The VRAM OC theory was confirmed by a tutorial video from Ancient Gameplays for overclocking the 7900 XT, and he explained that these issues could be caused by unstable VRAM overclock.

This worked for me and will update if these events come back. Hopefully this can provide some help.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
Thank you for the follow up! And yes it's helpful. This has been a bit difficult to follow, but if I'm remembering right, that makes four (?) of us, including me, with seemingly the same(ish) issue. Out of the four of us, one never followed up and I presume they just gave up on the 7800 XT, but the rest of us saw changes in severity (if not outright resolution of the issue) by either changing the graphics card, or changing the graphics card frequency/voltage. So either way this seems to imply the issue does seem to stem graphics card side?

My own video card RMA definitely saw a drastic drop in how often the problem occurs, but I've found at least one use-case where it still occurs. It's limited to Minecraft and specifically to 1.12. Current versions work fine. I'm not sure if this is the same issue as on the first one and is a hardware fault... or something with the drivers specific to that version of the game. I do know that AMD's drivers newer than 23.11.1 have this really strange performance behavior when I use v-sync in Minecraft (the card seems to try to lock itself to ~80% utilization and adjust clock speed to keep it there, which drastically impacts performance?), and all drivers as of the last year and a half since late 2022 or whenever AMD rewrote their OpenGL stuff also have a VRAM leak with even older versions of Minecraft (1.7 and prior I think but I don't play those so I'm not sure exactly what versions). So I don't know if this one is a hardware fault like the previous occurrences seemingly were (but then why only one specific version of a game causing it?) or if it might be drivers this time (but then why a machine check exception?). It's like... either way it makes little sense?

But I'm well beyond the point of being tired of spending money and time on it, so even though I hate having something with any known issue at all, my solution has been to avoid that particular use-case because it's stable otherwise (thus far...) and because... I don't know what else to do at this point.
 

Xaamoh

New Member
Joined
Apr 27, 2024
Messages
2 (0.15/day)
Thank you for the follow up! And yes it's helpful. This has been a bit difficult to follow, but if I'm remembering right, that makes four (?) of us, including me, with seemingly the same(ish) issue. Out of the four of us, one never followed up and I presume they just gave up on the 7800 XT, but the rest of us saw changes in severity (if not outright resolution of the issue) by either changing the graphics card, or changing the graphics card frequency/voltage. So either way this seems to imply the issue does seem to stem graphics card side?

My own video card RMA definitely saw a drastic drop in how often the problem occurs, but I've found at least one use-case where it still occurs. It's limited to Minecraft and specifically to 1.12. Current versions work fine. I'm not sure if this is the same issue as on the first one and is a hardware fault... or something with the drivers specific to that version of the game. I do know that AMD's drivers newer than 23.11.1 have this really strange performance behavior when I use v-sync in Minecraft (the card seems to try to lock itself to ~80% utilization and adjust clock speed to keep it there, which drastically impacts performance?), and all drivers as of the last year and a half since late 2022 or whenever AMD rewrote their OpenGL stuff also have a VRAM leak with even older versions of Minecraft (1.7 and prior I think but I don't play those so I'm not sure exactly what versions). So I don't know if this one is a hardware fault like the previous occurrences seemingly were (but then why only one specific version of a game causing it?) or if it might be drivers this time (but then why a machine check exception?). It's like... either way it makes little sense?

But I'm well beyond the point of being tired of spending money and time on it, so even though I hate having something with any known issue at all, my solution has been to avoid that particular use-case because it's stable otherwise (thus far...) and because... I don't know what else to do at this point.
I am wondering, if you ever tried to replace the power-cord "GPU to power-supply"? My defected cable was working for months, but I always got some flicker on my monitor. I thought my screen is just no synced. It helped to switch off switch on the screen. But apparently the cable was not able to handle a stable power source. Now with the new cable: no flickering, no black screens, no issues what so ever.
 
Joined
Jan 1, 2012
Messages
115 (0.03/day)
The monitor itself isn't flickering. It's just with one game in particular (Minecraft Java) and one version of that game in particular (1.12) and perhaps it's even only with mods, the game world itself will rarely flicker for one single frame to show either just the sky box with no world geometry or a full White frame (I'm not entirely sure but it's a single frame flicker either way).

I've disconnected and reconnected the two PCI Express power cables a couple of times, but I haven't swapped them (and speaking of these cables, someone on another forum having Black screen restarts posted a follow up and showed melted connectors on both ends of the cable with an RTX 3080; I figure that was a case of pig-tailing it and drawing too much power through one cable but now it has me panicking and wanting to recheck mine). But since the issue was eliminated (other than this) with the graphics card RMA, it's lead me to believe the issue was indeed the graphics card.

Given the other issues AMD's drivers have in Minecraft, I've sort of figured this might be another example of it and whatever modded 1.12 is doing simply isn't being tolerated by the drivers since nothing else crashes like that (yet). My only concern with that conclusion is... it's not just an application or drivers crash, but the same old machine check exception complete PC crash. Seems unlikely to me a single game can cause that unless it's doing something the hardware can't handle (but should) to begin with and thus it's still a hardware fault? But I don't know.
 
Top