• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

PC still keep rebooting with the error "Reported by component: Processor Core Error Source: Machine Check Exception Error Type: Cache Hierarchy Error

Joined
Feb 14, 2024
Messages
39 (0.17/day)
I've had this issue since I built my PC in feburary, I've made multiple posts across different forums, subreddits, discord servers etc, and it still keeps happening. My monitor goes black, the sound either hangs or cuts out entirely or just skips, and it reboots. Sometimes I'm just stuck with a black screen for like 20 seconds, with nothing happening, and sometimes it reboots immediately. It has happened while gaming, once or twice while just idle and two times while downloading an update through the Black Desert Online launcher specifially. Sometimes it runs perfectly fine for day, some games (SkaterXL) only crashes on one specific modded map, and only in certain spots, other than that, not at all. Meanwhile No Man's Sky crashes after about 5-10 minutes everytime. Something like Apex Legends can run fine for days, or crash every few minutes, it's completely random. Temps are all perfectly fine. Event Viewer always says the excat same thing with differing Processor APIC ID's. I've at this point tried everything, and I'm literally clueless and frustrated. I bought a new motherboard, and installed it, hoping it'd work, nothing. Same shit keeps happening with the exact same errors. I've rplaced or rma'd literally every single component except my SSD, and it still happens. I don't know what to do anymore. I feel like i wasted 700 bucks cause this damn box of metal refuses to work correctly, and I just can't figure out why.
Specs:
GPU: Radeon 6650XT
CPU: Ryzen 5 5600
Motherboard: MSI B550M Pro-VDH (old one: ASRock B450M Pro4 R2.0)
SSD: 1TB Lexar NM620 M.2
RAM: 16GB (2x8GB) Kingston FURY Beast
PSU: 700 Watt be quet! System Power 9
OS: Windows 10 Pro
Heres a dropbox link with information, inlcuding app dumps
 
Last edited:
Joined
Sep 3, 2019
Messages
3,425 (1.84/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.8.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v24H2, OSBuild 26100.1882), upgraded from Win10 to Win11 on Feb 2024
Board BIOS update?
Board Chipset update? (AMD only website)

What’s the speed of DRAM, and did you try to lower it just for testing?
What is the speed of FCLK:UCLK?

Is DRAM on the QVL of the board?

Can you test a completely different DRAM?

Cache hierarchy errors are usually related with infinity fabric (FCLK) and some times with memory in general.
Could also mean in rare occasions a faulty CPU, but you said you rma everything…

Can we get a screenshot of ZenTimings app?
 
Joined
Apr 21, 2009
Messages
112 (0.02/day)
System Name littlet
Processor Ryzen 5900X
Motherboard TUF GAMING B550M-PLUS (WI-FI) ZAKU II EDITION
Cooling EK AIO Basic 240
Memory G.Skill Trident Z RGB 4X16GB 3200MHz
Video Card(s) RTX™ A2000 12GB
Storage C: WD_BLACK SN850 1TB
Display(s) ViewSonic VX3276-2K-mhd
Case ASUS Prime AP201
Audio Device(s) Asus Xonar Essence STX II
Power Supply Corsair SF 600
Software Windows
I've had this issue since I built my PC in feburary, I've made multiple posts across different forums, subreddits, discord servers etc, and it still keeps happening. My monitor goes black, the sound either hangs or cuts out entirely or just skips, and it reboots. Sometimes I'm just stuck with a black screen for like 20 seconds, with nothing happening, and sometimes it reboots immediately. It has happened while gaming, once or twice while just idle and two times while downloading an update through the Black Desert Online launcher specifially. Sometimes it runs perfectly fine for day, some games (SkaterXL) only crashes on one specific modded map, and only in certain spots, other than that, not at all. Meanwhile No Man's Sky crashes after about 5-10 minutes everytime. Something like Apex Legends can run fine for days, or crash every few minutes, it's completely random. Temps are all perfectly fine. Event Viewer always says the excat same thing with differing Processor APIC ID's. I've at this point tried everything, and I'm literally clueless and frustrated. I bought a new motherboard, and installed it, hoping it'd work, nothing. Same shit keeps happening with the exact same errors. I've rplaced or rma'd literally every single component except my SSD, and it still happens. I don't know what to do anymore. I feel like i wasted 700 bucks cause this damn box of metal refuses to work correctly, and I just can't figure out why.
Specs:
GPU: Radeon 6650XT
CPU: Ryzen 5 5600
Motherboard: MSI B550M Pro-VDH (old one: ASRock B450M Pro4 R2.0)
SSD: 1TB Lexar NM620 M.2
RAM: 16GB (2x8GB) Kingston FURY Beast
OS: Windows 10 Pro
Replace the cpu, the memory controller is unstable. Lowering the DDR speed may buy you some time but the issue is terminal
 
Joined
Sep 3, 2019
Messages
3,425 (1.84/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.8.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v24H2, OSBuild 26100.1882), upgraded from Win10 to Win11 on Feb 2024
Replace the cpu, the memory controller is unstable. Lowering the DDR speed may buy you some time but the issue is terminal
Wow… hold your horses

OP said that everything was RMA-ed except SSD.
Have you heard about CPU/board/DRAM incompatibility on AM4 that causes instability? Most of it is ironed out with newer BIOS versions but still some may exist.

Isn’t it wiser to try and pin point what exactly is the culprit and if there is any.
Isn’t worth to try first updating everything before getting into process of replacing something?
 
Joined
Feb 14, 2024
Messages
39 (0.17/day)
Board BIOS update?
Board Chipset update? (AMD only website)

What’s the speed of DRAM, and did you try to lower it just for testing?
What is the speed of FCLK:UCLK?

Is DRAM on the QVL of the board?

Can you test a completely different DRAM?

Cache hierarchy errors are usually related with infinity fabric (FCLK) and some times with memory in general.
Could also mean in rare occasions a faulty CPU, but you said you rma everything…

Can we get a screenshot of ZenTimings app?
Bios and Chipset is all fine, just installed the board today and updated everything. Actually no, scratch that, I think theres a new bios version available, but I lost my only usb that I can format, so I'll have to get a new one. But I dont think its that either if its still the same issues as with my other mobo, since on my old one the bios was up to date. The current version on this board is from 02/01/2024, so its two versions behind, but not ancient.
Dram max is 3200, havent enabled any xmp in bios, so rn its running at 2400. FCLK is on auto, so I'm assuming they're matched up correctly.
RAM is on the QVL list as well, and I already tried a different kit on my last Mobo, but that didn't make a difference. I sent it back afterwards, since it didn't change anything.

And yeah, I already sent the CPU back to AMD and got it replaced, and th exact same erros are still happening at the same times, so I'm assuming the CPU, atleast in it of itself isn't at fault.
 

Attachments

  • Screenshot 2024-08-24 172519.png
    Screenshot 2024-08-24 172519.png
    50 KB · Views: 37
Last edited:

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
41,466 (6.58/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
I've had this issue since I built my PC in feburary, I've made multiple posts across different forums, subreddits, discord servers etc, and it still keeps happening. My monitor goes black, the sound either hangs or cuts out entirely or just skips, and it reboots. Sometimes I'm just stuck with a black screen for like 20 seconds, with nothing happening, and sometimes it reboots immediately. It has happened while gaming, once or twice while just idle and two times while downloading an update through the Black Desert Online launcher specifially. Sometimes it runs perfectly fine for day, some games (SkaterXL) only crashes on one specific modded map, and only in certain spots, other than that, not at all. Meanwhile No Man's Sky crashes after about 5-10 minutes everytime. Something like Apex Legends can run fine for days, or crash every few minutes, it's completely random. Temps are all perfectly fine. Event Viewer always says the excat same thing with differing Processor APIC ID's. I've at this point tried everything, and I'm literally clueless and frustrated. I bought a new motherboard, and installed it, hoping it'd work, nothing. Same shit keeps happening with the exact same errors. I've rplaced or rma'd literally every single component except my SSD, and it still happens. I don't know what to do anymore. I feel like i wasted 700 bucks cause this damn box of metal refuses to work correctly, and I just can't figure out why.
Specs:
GPU: Radeon 6650XT
CPU: Ryzen 5 5600
Motherboard: MSI B550M Pro-VDH (old one: ASRock B450M Pro4 R2.0)
SSD: 1TB Lexar NM620 M.2
RAM: 16GB (2x8GB) Kingston FURY Beast
OS: Windows 10 Pro
And ofc no power supply,

Bios and Chipset is all fine, just installed the board today and updated everything. Actually no, scratch that, I think theres a new bios version available, but I lost my only usb that I can format, so I'll have to get a new one. But I dont think its that either if its still the same issues as with my other mobo, since on my old one the bios was up to date. The current version on this board is from 02/01/2024, so its two versions behind, but not ancient.
Dram max is 3200, havent enabled any xmp in bios, so rn its running at 2400. FCLK is on auto, so I'm assuming they're matched up correctly.
RAM is on the QVL list as well, and I already tried a different kit on my last Mobo, but that didn't make a difference. I sent it back afterwards, since it didn't change anything.

And yeah, I already sent the CPU back to AMD and got it replaced, and th exact same erros are still happening at the same times, so I'm assuming the CPU, atleast in it of itself isn't at fault.
Could be the game or memory at this rate

Could be a corrupt os too
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,030 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Board BIOS update?
Board Chipset update? (AMD only website)

What’s the speed of DRAM, and did you try to lower it just for testing?
What is the speed of FCLK:UCLK?

Is DRAM on the QVL of the board?

Can you test a completely different DRAM?

Cache hierarchy errors are usually related with infinity fabric (FCLK) and some times with memory in general.
Could also mean in rare occasions a faulty CPU, but you said you rma everything…

Can we get a screenshot of ZenTimings app?

Cache Hierarchy usually isn't anything IF or memory related.

Bios and Chipset is all fine, just installed the board today and updated everything. Actually no, scratch that, I think theres a new bios version available, but I lost my only usb that I can format, so I'll have to get a new one. But I dont think its that either if its still the same issues as with my other mobo, since on my old one the bios was up to date. The current version on this board is from 02/01/2024, so its two versions behind, but not ancient.
Dram max is 3200, havent enabled any xmp in bios, so rn its running at 2400. FCLK is on auto, so I'm assuming they're matched up correctly.
RAM is on the QVL list as well, and I already tried a different kit on my last Mobo, but that didn't make a difference. I sent it back afterwards, since it didn't change anything.

And yeah, I already sent the CPU back to AMD and got it replaced, and th exact same erros are still happening at the same times, so I'm assuming the CPU, atleast in it of itself isn't at fault.

So you did a second RMA with AMD this time, got what you verified to be a new CPU, and it's still happening?

I remember the old thread and I gotta say, this is a weird one. Only thing I can think of is maybe a clean install of windows if you haven't already?

And how confident are you in the health of this SSD.

Cannot remember if you already tried disabling global Cstates.

ReBAR on or off?
 
Joined
Jan 1, 2012
Messages
270 (0.06/day)
I had a similar issue after I made a change of a part late last year. I made it a pretty long and detailed thread about the whole situation here if you want to research further to see it gives you any idea. If you want a summary of the important bits...

1. In my case, the issue started when I changed my graphics cards. And going back to the old graphics card stopped the issue. I did a lot of troubleshooting to rule out possible causes before sending back a "maybe good graphics card" but I ran out of options and ended up doing an RMA on the new graphics card. The returned part appeared to be a new one instead a used/refurbished one. Anyway, it mostly resolved it. I say mostly because I found one scenario that still causes it (which doesn't sit right with me), but it's a specific version of a specific game, and a modded version at that, so I just avoid it and the issue was otherwise resolved. Time will tell if it returns.

2. Oddly, I found the issue got worse if I disabled XMP or removed half the RAM, which is the opposite of what I'd expect since my memory configuration is rather heavy. It also got worse if I undervolted the CPU. That almost suggested to me a platform-side instability, but... the results were hard to argue with when the system was stable before the graphics card was changed (but I never tried undervolting the CPU before that point) and also stable after an RMA of the graphics card. I also tried my previous CPU and it was doing the same thing. I played with c-states, PCI Express options, a whole slew of other BIOS options, Windows power options, SAM (AMD's equivalent of Resizable BAR), you name it, and none of it helped.

That's not to say it's your graphics card for sure. Event ID 18 can be multiple things. It is as it sounds, a condition in which the system encountered some scenario it considers a machine check exception (Windows terms it an uncorrectable hardware error, as opposed to Event ID 19 which is a sort of soft version of it and is a corrected hardware error). Machine check exceptions can be "caught" by either the CPU or RAM (seems to more often be caught by the CPU, or possibly because you need ECC RAM to catch it there), so the fact that it's saying it's the CPU doesn't mean the CPU is the fault. If it's a different APIC ID then that at least suggests it's not a particular core that is bad. It seems the type is often either "Cache Hierarchy Error" or "Bus/Interconnect Error" at least on AMD platforms, and maybe there others, but between the two, the former seems more common and is less specific about what is causing it, so it's a bit of trial and error as to where the issue is.

I think my graphics card may have had bad VRAM or some bad power delivery regulation, but I'm purely speculating. It only happened in scenarios where the graphics card was under medium or higher load, but it was never happening consistently at very high load, and it even happened under very low load with XMP off. The system would restart, and never power off in the interim so I wasn't really suspecting the PSU. You'll have use those sort of clues like I did to try and figure out what part to rule out. But you already sound like you're pretty far down down that path! If you actually did an RMA on everything except the SSD and the issue remains (!) then I'm both impressed/shocked. I'd have been crying long before that point. I guess reinstalling Windows and/or doing an RMA on that would be the only thing left to try.
 
Last edited:

Borkster

New Member
Joined
Aug 24, 2024
Messages
2 (0.05/day)
I registered on the site just to post this ! Not sure what it's worth, but I'd hate to see someone struggling for something that might be an easy fix. I had almost the same issue with an AMD cpu (3700x and then a 7900) and gpu (6700xt) last year / the year before - random resets, screen blackouts. Sometimes after a reboot the graphics device would be disabled - running on default drivers, and would require a re-enable and then a reboot. But not always. It would seem that the fault was in the GPU somewhere, and windows at times was disabling the hardware/driver because it was faulting.
I tried all sorts with very limited success, under volting, changing profiles, refresh rates, until after suspecting on a hunch that it was something to do with graphics card power spike management I uninstalled everything to do with it, and reinstalled with just the driver - none of the add ons. At that point the issue went away. Without going into it too much, I suspect the added faff to control power and profiles was underpowering the GPU and at critical points when performance ramped up even slightly, the GPU ends up starved of power and dies. Or something like that. I think the added faff just tries to be too clever and ends up making the hardware crash.
Since I've been running without any of the additional AMD software the system hasn't crashed.
If you've not tried this, I would give it a go, as it's an easy thing to do, and, given that it seems you have RMA'd a lot of your setup, which probably means it isn't something inherently broken in your system, it could just be down to bad / buggy drivers or software.
Also FWIW I had similar black out issues on an AMD powered laptop. *That* one ended up bizarrely to be the SSD. It really didn't present as an SSD issue at all - random crashes and bug outs with memory and all sorts. Everything but the SSD. After having had it serviced and a few things replaced and the problem still occurring, I had poked around and discovered the SSD was running far too hot. Not the temperature I'd expect an SSD to be. On a hunch I junked the SSD put a new one in - no more issues. If you haven't tried that, you could also give that a go.
 
Joined
Aug 15, 2022
Messages
399 (0.51/day)
Location
Some Where On Earth
System Name Spam
Processor i9-12900K PL1=125 TA=56 PL2=288
Motherboard MSI MAG B660M Mortar WiFi DDR4
Cooling Scythe Kaze Flex 120mm ARGB Fans x1 / Alphacool Eisbaer 360
Memory Mushkin Red Line DDR4 4000 16Gb x2 18-22-22-42 1T
Video Card(s) Sapphire Pulse RX 7900 XT
Storage Team Group MP33 512Mb / 1Tb
Display(s) SAMSUNG Odyssey G50A (LS27AG500PNXZA) (2560x1440)
Case Lan-Li A3
Audio Device(s) Real Tek on Board Audio
Power Supply EVGA SuperNOVA 850 GM
Mouse M910-K
Keyboard K636CLO
Software WIN 11 Pro
but I lost my only usb that I can format
A bad usb device could cause the issue you are having, so you might want to try different usb devices and see if it fixes the issue. Also, you might want to try turning off "Fast Startup" and see if that makes a difference.

Not the temperature I'd expect an SSD to be. On a hunch I junked the SSD put a new one in - no more issues. If you haven't tried that, you could also give that a go.
This would be my next guess since it is the only thing you have not replaced.
 
Joined
Feb 14, 2024
Messages
39 (0.17/day)
Cache Hierarchy usually isn't anything IF or memory related.



So you did a second RMA with AMD this time, got what you verified to be a new CPU, and it's still happening?

I remember the old thread and I gotta say, this is a weird one. Only thing I can think of is maybe a clean install of windows if you haven't already?

And how confident are you in the health of this SSD.

Cannot remember if you already tried disabling global Cstates.

ReBAR on or off?
Yep, did a second RMA, but with AMD directly and made sure to take pictures if the CPU, of course, they can probably jsut change the cover and call it a day, but I doubt AMD themselves would do that. And yep, still happening.

Just reinstalled windows, but through the regular settings. Not sure if that's the same thing as a full reinstall? Like completely wiping the PC and reinstalling it from a usb.

Pretty confident I guess? Crystaldiskinfo says it's fine, so I'm basing it off of that.

I'm pretty sure I've tried disabling global c states on the old Mobo, not sure about ReBAR. I think I've tried turning it off though (or on, can't remember, if it's on by default, I turned it off), but also only on the old Mobo. Haven't tried on the new one yet.

I've also just decided to send my GPU back and upgrade. It's probably not the GPU, but I might as well try a whole different GPU. I was gonna upgrade soon anyways to an actual 1440p card, instead of my 6650XT. Maybe that'll work, who knows. Eventually something has to work. And if I get rid of every single physical device and replace them, and it keep happening, I'll at least know it's not the devices themselves but something else.

I had a similar issue after I made a change of a part late last year. I made it a pretty long and detailed thread about the whole situation here if you want to research further to see it gives you any idea. If you want a summary of the important bits...

1. In my case, the issue started when I changed my graphics cards. And going back to the old graphics card stopped the issue. I did a lot of troubleshooting to rule out possible causes before sending back a "maybe good graphics card" but I ran out of options and ended up doing an RMA on the new graphics card. The returned part appeared to be a new one instead a used/refurbished one. Anyway, it mostly resolved it. I say mostly because I found one scenario that still causes it (which doesn't sit right with me), but it's a specific version of a specific game, and a modded version at that, so I just avoid it and the issue was otherwise resolved. Time will tell if it returns.

2. Oddly, I found the issue got worse if I disabled XMP or removed half the RAM, which is the opposite of what I'd expect since my memory configuration is rather heavy. It also got worse if I undervolted the CPU. That almost suggested to me a platform-side instability, but... the results were hard to argue with when the system was stable before the graphics card was changed (but I never tried undervolting the CPU before that point) and also stable after an RMA of the graphics card. I also tried my previous CPU and it was doing the same thing. I played with c-states, PCI Express options, a whole slew of other BIOS options, Windows power options, SAM (AMD's equivalent of Resizable BAR), you name it, and none of it helped.

That's not to say it's your graphics card for sure. Event ID 18 can be multiple things. It is as it sounds, a condition in which the system encountered some scenario it considers a machine check exception (Windows terms it an uncorrectable hardware error, as opposed to Event ID 19 which is a sort of soft version of it and is a corrected hardware error). Machine check exceptions can be "caught" by either the CPU or RAM (seems to more often be caught by the CPU, or possibly because you need ECC RAM to catch it there), so the fact that it's saying it's the CPU doesn't mean the CPU is the fault. If it's a different APIC ID then that at least suggests it's not a particular core that is bad. It seems the type is often either "Cache Hierarchy Error" or "Bus/Interconnect Error" at least on AMD platforms, and maybe there others, but between the two, the former seems more common and is less specific about what is causing it, so it's a bit of trial and error as to where the issue is.

I think my graphics card may have had bad VRAM or some bad power delivery regulation, but I'm purely speculating. It only happened in scenarios where the graphics card was under medium or higher load, but it was never happening consistently at very high load, and it even happened under very low load with XMP off. The system would restart, and never power off in the interim so I wasn't really suspecting the PSU. You'll have use those sort of clues like I did to try and figure out what part to rule out. But you already sound like you're pretty far down down that path! If you actually did an RMA on everything except the SSD and the issue remains (!) then I'm both impressed/shocked. I'd have been crying long before that point. I guess reinstalling Windows and/or doing an RMA on that would be the only thing left to try.
Thank you for the long and detailed reply, I'll definitely have to check out your thread, thank you.

I've decided to just get a new GPU now. And for the crying part, let's just say I've more than once thought of just selling the whole thing for cheap. It's a mind fuck when you can't even remember what you've tried to change and mess with on the software side cause you've tried so much stuff for half a year haha.

I registered on the site just to post this ! Not sure what it's worth, but I'd hate to see someone struggling for something that might be an easy fix. I had almost the same issue with an AMD cpu (3700x and then a 7900) and gpu (6700xt) last year / the year before - random resets, screen blackouts. Sometimes after a reboot the graphics device would be disabled - running on default drivers, and would require a re-enable and then a reboot. But not always. It would seem that the fault was in the GPU somewhere, and windows at times was disabling the hardware/driver because it was faulting.
I tried all sorts with very limited success, under volting, changing profiles, refresh rates, until after suspecting on a hunch that it was something to do with graphics card power spike management I uninstalled everything to do with it, and reinstalled with just the driver - none of the add ons. At that point the issue went away. Without going into it too much, I suspect the added faff to control power and profiles was underpowering the GPU and at critical points when performance ramped up even slightly, the GPU ends up starved of power and dies. Or something like that. I think the added faff just tries to be too clever and ends up making the hardware crash.
Since I've been running without any of the additional AMD software the system hasn't crashed.
If you've not tried this, I would give it a go, as it's an easy thing to do, and, given that it seems you have RMA'd a lot of your setup, which probably means it isn't something inherently broken in your system, it could just be down to bad / buggy drivers or software.
Also FWIW I had similar black out issues on an AMD powered laptop. *That* one ended up bizarrely to be the SSD. It really didn't present as an SSD issue at all - random crashes and bug outs with memory and all sorts. Everything but the SSD. After having had it serviced and a few things replaced and the problem still occurring, I had poked around and discovered the SSD was running far too hot. Not the temperature I'd expect an SSD to be. On a hunch I junked the SSD put a new one in - no more issues. If you haven't tried that, you could also give that a go.
Might he a stupid question, but how did you install the just the driver? I'm assuming you didn't use AMDs Adrenalin Software, or their auto installer on their website?
I think I'll throw in a new SSD as well, I've decided to get a new GPU now. My new Mobo has a cooling pad over the SSD (I forgot if their called cooling pads), so I think temps should be better on this board, compared to the old one, but it's still happening

A bad usb device could cause the issue you are having, so you might want to try different usb devices and see if it fixes the issue. Also, you might want to try turning off "Fast Startup" and see if that makes a difference.


This would be my next guess since it is the only thing you have not replaced.
Disabled fast start-up almost immediately when I saw it as a possible fix, didn't change anything though. i don't think I ever turned only back on. Bad USB could be, but I've already tried uninstalling every USB device and just having my mouse and keyboard connected, but it still happened sadly
 
Last edited:

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,030 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Yep, did a second RMA, but with AMD directly and made sure to take pictures if the CPU, of course, they can probably jsut change the cover and call it a day, but I doubt AMD themselves would do that. And yep, still happening.

Just reinstalled windows, but through the regular settings. Not sure if that's the same thing as a full reinstall? Like completely wiping the PC and reinstalling it from a usb.

Pretty confident I guess? Crystaldiskinfo says it's fine, so I'm basing it off of that.

I'm pretty sure I've tried disabling global c states on the old Mobo, not sure about ReBAR. I think I've tried turning it off though (or on, can't remember, if it's on by default, I turned it off), but also only on the old Mobo. Haven't tried on the new one yet.

I've also just decided to send my GPU back and upgrade. It's probably not the GPU, but I might as well try a whole different GPU. I was gonna upgrade soon anyways to an actual 1440p card, instead of my 6650XT. Maybe that'll work, who knows. Eventually something has to work. And if I get rid of every single physical device and replace them, and it keep happening, I'll at least know it's not the devices themselves but something else.


Thank you for the long and detailed reply, I'll definitely have to check out your thread, thank you.

I've decided to just get a new GPU now. And for the crying part, let's just say I've more than once thought of just selling the whole thing for cheap. It's a mind fuck when you can't even remember what you've tried to change and mess with on the software side cause you've tried so much stuff for half a year haha.


Might he a stupid question, but how did you install the just the driver? I'm assuming you didn't use AMDs Adrenalin Software, or their auto installer on their website?
I think I'll throw in a new SSD as well, I've decided to get a new GPU now. My new Mobo has a cooling pad over the SSD (I forgot if their called cooling pads), so I think temps should be better on this board, compared to the old one, but it's still happening


Disabled fast start-up almost immediately when I saw it as a possible fix, didn't change anything though. i don't think I ever turned only back on. Bad USB could be, but I've already tried uninstalling every USB device and just having my mouse and keyboard connected, but it still happened sadly

In theory reinstalling windows through windows should accomplish the same, but I don't trust Microsoft's shenanigans enough for that. If you chose to keep your files when prompted, then it's not really a clean install.

Crystaldiskinfo doesn't really provide useful info on SSD health in terms of its life estimate. First party SSD software might or might not do better.
 

Borkster

New Member
Joined
Aug 24, 2024
Messages
2 (0.05/day)
Might he a stupid question, but how did you install the just the driver? I'm assuming you didn't use AMDs Adrenalin Software, or their auto installer on their website?
I think I'll throw in a new SSD as well, I've decided to get a new GPU now. My new Mobo has a cooling pad over the SSD (I forgot if their called cooling pads), so I think temps should be better on this board, compared to the old one, but it's still happening
Been a hot minute since I've done this, but I believe there is a driver only install when you install the package. Make sure you uninstall it all if already installed. Then reinstall with the driver only option. You wont have any of the fancy dashboard or controls, just the gpu drivers.
 
Joined
Nov 16, 2023
Messages
1,079 (3.36/day)
Location
Nowhere
System Name I don't name my rig
Processor 14700K
Motherboard Asus TUF Z790
Cooling Air/water/DryIce
Memory DDR5 G.Skill Z5 RGB 6000mhz C36
Video Card(s) RTX 4070 Super
Storage 980 Pro
Display(s) Some LED 1080P TV
Case Open bench
Audio Device(s) Some Old Sherwood stereo and old cabinet speakers
Power Supply Corsair 1050w HX series
Mouse Razor Mamba Tournament Edition
Keyboard Logitech G910
VR HMD Quest 2
Software Windows
Benchmark Scores Max Freq 13700K 6.7ghz DryIce Max Freq 14700K 7.0ghz DryIce Max all time Freq FX-8300 7685mhz LN2
Only actual resolve for this blue screen, is RMA the cpu.

 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,030 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Only actual resolve for this blue screen, is RMA the cpu.


Might be a little late for the conventional wisdom since they've already RMA'd it. Seems something else is going on.
 
Joined
Nov 16, 2023
Messages
1,079 (3.36/day)
Location
Nowhere
System Name I don't name my rig
Processor 14700K
Motherboard Asus TUF Z790
Cooling Air/water/DryIce
Memory DDR5 G.Skill Z5 RGB 6000mhz C36
Video Card(s) RTX 4070 Super
Storage 980 Pro
Display(s) Some LED 1080P TV
Case Open bench
Audio Device(s) Some Old Sherwood stereo and old cabinet speakers
Power Supply Corsair 1050w HX series
Mouse Razor Mamba Tournament Edition
Keyboard Logitech G910
VR HMD Quest 2
Software Windows
Benchmark Scores Max Freq 13700K 6.7ghz DryIce Max Freq 14700K 7.0ghz DryIce Max all time Freq FX-8300 7685mhz LN2
Yep, did a second RMA, but with AMD directly and made sure to take pictures if the CPU, of course, they can probably jsut change the cover and call it a day, but I doubt AMD themselves would do that. And yep, still happening.
Sorry, I replied before I squeezed your quote here in.

Did you receive a brand new in box Un-opened cpu with cooler?

RMA means they test and send back the old one if they find no problems.

Otherwise, RMA again. You know the saying, third times the charm.

All this of course testing with any other cpu.

AMD no longer offers thr boot kit for B450 any more. Or I'd suggest that. The usually send a 220GE or a 200GE processor.

Good luck

Might be a little late for the conventional wisdom since they've already RMA'd it. Seems something else is going on.
Yes, I'm not good at this on my phone. Thank you for being attentive.

Also, it's the only actual solution since Zenever...
 
Joined
Jan 1, 2012
Messages
270 (0.06/day)
Thank you for the long and detailed reply, I'll definitely have to check out your thread, thank you.

I've decided to just get a new GPU now. And for the crying part, let's just say I've more than once thought of just selling the whole thing for cheap. It's a mind fuck when you can't even remember what you've tried to change and mess with on the software side cause you've tried so much stuff for half a year haha.
For clarity, you said you did an RMA on everything except the SSD, but I'm wondering about something. When doing an RMA, you don't always get a new part back. Now, a different used/refurbished one should be fine, but sometimes they send the same part back and determine it wasn't faulty. This is important because if you just got the same one back, it doesn't necessarily rule it out as a possible cause, even if you did an RMA on it and they didn't find an issue on their end. Maybe they missed it. Like maybe it's some combination of factors causing it on your end but not theirs (when I was going through my issues, I dreaded this possibility).

If you're trying a different graphics card entirely, that should help rule that out.

Also, I forgot to mention this before, but there's possibly logs being created for this, even if there's no BSODs.

Check these directories...

Windows/LiveKernelReports/WHEA
Windows/LiveKernelReports/WATCHDOG


If it's anything like mine, you may find generic 0x124 errors in the WHEA directory (which basically means an "uncorrectable hardware error occurred" , just like what event ID 18 means in event viewer). If there's logs in the second directory too, those may be more specific.

You can use WinDbg to analyze logs.
 
Joined
Feb 14, 2024
Messages
39 (0.17/day)
For clarity, you said you did an RMA on everything except the SSD, but I'm wondering about something. When doing an RMA, you don't always get a new part back. Now, a different used/refurbished one should be fine, but sometimes they send the same part back and determine it wasn't faulty. This is important because if you just got the same one back, it doesn't necessarily rule it out as a possible cause, even if you did an RMA on it and they didn't find an issue on their end. Maybe they missed it. Like maybe it's some combination of factors causing it on your end but not theirs (when I was going through my issues, I dreaded this possibility).

If you're trying a different graphics card entirely, that should help rule that out.

Also, I forgot to mention this before, but there's possibly logs being created for this, even if there's no BSODs.

Check these directories...

Windows/LiveKernelReports/WHEA
Windows/LiveKernelReports/WATCHDOG


If it's anything like mine, you may find generic 0x124 errors in the WHEA directory (which basically means an "uncorrectable hardware error occurred" , just like what event ID 18 means in event viewer). If there's logs in the second directory too, those may be more specific.

You can use WinDbg to analyze logs.
Sorry for the late reply. Yep, I RMA'd everything except the SSD, however, the GPu was sent back because they couldn't find anything wrong with it. Twice now, I got it back yesterday. I also chatted with the tech support from the online retailer I bought everything from, and they told me that apparently this is a well known AMD issue, which is caused by hardware acceleration, and that I should turn hardware acceleration off for every app that has it. I've now just been going through seemingly every app and trying to find if I can disable hardware acceleration on it. Maybe this'll work, I don't know. I'm probably still gonna get a new GPU, but if it's actually driver related, I at least want to try to fix it before getting a new one.

Also, when I got the GPU back, I got a driver related crash, and lo and behold, the exact same Cache Hierarchy Error popped up in event viewer. So I'm praying that it's just software. Or maybe I'm praying that its actually hardware, cause that'd be easier to fix, idk.
 
Joined
Jan 1, 2012
Messages
270 (0.06/day)
Hardware acceleration is pretty much "the graphics card is being used" though, isn't it? So in a roundabout way, if that's the problem, isn't that like saying the video card is still the problem?
 
Joined
Feb 14, 2024
Messages
39 (0.17/day)
Been a hot minute since I've done this, but I believe there is a driver only install when you install the package. Make sure you uninstall it all if already installed. Then reinstall with the driver only option. You wont have any of the fancy dashboard or controls, just the gpu drivers.
I'll have to check that, thank you. I'm sure there'll be guides somewhere out there if it's not as simple as clicking an option

Sorry, I replied before I squeezed your quote here in.

Did you receive a brand new in box Un-opened cpu with cooler?

RMA means they test and send back the old one if they find no problems.

Otherwise, RMA again. You know the saying, third times the charm.

All this of course testing with any other cpu.

AMD no longer offers thr boot kit for B450 any more. Or I'd suggest that. The usually send a 220GE or a 200GE processor.

Good luck


Yes, I'm not good at this on my phone. Thank you for being attentive.

Also, it's the only actual solution since Zenever...
Yup, got a new in box CPU with cooler, made sure to photograph the tip of the CPU as well to make sure they're not the same

I read it
Hardware acceleration is pretty much "the graphics card is being used" though, isn't it? So in a roundabout way, if that's the problem, isn't that like saying the video card is still the problem?
I read it wrong at first, it says to disable hardware acceleration for any app that can play videos, so discord, browsers etc. Not hardware accel in general
 
Joined
Jan 1, 2012
Messages
270 (0.06/day)
That's what hardware acceleration in such programs does; it uses the graphics card to help accelerate it instead of only the CPU.

If a graphics card has issues with stability with that enabled, doesn't that suggest maybe the graphics card itself might not be entirely stable to begin with? That seems like it would be like saying you have to switch to software rendering in a game to make a graphics card stable. By the very nature of that phrase, that seems to mean the graphics card (or drivers) might not be entirely stable to begin with.

I guess I'm questioning the claim that this should be acceptable to need disabled on AMD hardware to be stable. I haven't had to do that on mine, at least, but I suppose I'm a sample size of one compared to what I presume is a larger sample size that tech is basing that claim on.
 
Joined
Mar 4, 2016
Messages
677 (0.22/day)
Location
Zagreb, Croatia
System Name D30 w.2x E5-2680; T5500 w.2x X5675;2x P35 w.X3360; 2x Q33 w.Q9550S/Q9400S & laptops.
Did you update the BIOS? Also BIOS has text file with listed changes in all versions, usually. Maybe they have fixed that for your motherboard?

Also, did you use memchecking bootable program, like MEMtest or similar? This would test the RAM & CPU cache, to make sure that is not an issue.
 
Joined
Oct 16, 2019
Messages
23 (0.01/day)
Find in BIOS configuration option named Power Supply Idle Control and set it to Typical, next find Global C-state control and disable it.
Many PSUs do not work correctly with idle below single digit watts, and with spikes between high demand and idle (i.e. loading levels in some games). I've had this problem with brand new Corsair 2022 PSU.

Another user similar experience (freezing or BSODs):
 
Joined
Feb 14, 2024
Messages
39 (0.17/day)
That's what hardware acceleration in such programs does; it uses the graphics card to help accelerate it instead of only the CPU.

If a graphics card has issues with stability with that enabled, doesn't that suggest maybe the graphics card itself might not be entirely stable to begin with? That seems like it would be like saying you have to switch to software rendering in a game to make a graphics card stable. By the very nature of that phrase, that seems to mean the graphics card (or drivers) might not be entirely stable to begin with.

I guess I'm questioning the claim that this should be acceptable to need disabled on AMD hardware to be stable. I haven't had to do that on mine, at least, but I suppose I'm a sample size of one compared to what I presume is a larger sample size that tech is basing that claim on.
Sorry that I keep taking so long to reply. Small update, I might've fixed it, not 100% sure yet, but all it might've taken was switching my M.2 from the top slot, to the bottom one, since the top one is directly controlled by the CPU, it would explain with the errors in event viewer kept pointing to my CPU
 
Top