• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Trying to sort out instabilities after my GPU upgrade.

Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
System spec as in profile. Please move to appropriate subforum if needed.

I recently upgraded my GPU from a 3070 to the current 7900XTX. There are a few instabilities here and there, but because at the time I only played Forza Horizon 5 and Forza Motorsport 8, and FM8 is known to hate 7900XTX in its day one version, so I brushed it off as a game-specific thing. Also at the same time, The Crew Motorfest also throws some driver timeout crashes, but it is infrequent enough to not caught my attention.

More recently, as winter is coming, I started folding@home again, which depends on the the work units, will be completely fine, or, instantly causes reboot, sometimes repeatedly. So I tried a few things to ensure the system is stable, but to no avail, and potentially made things worse.
Everytime it reboots, there are no BSODs, and no Event Viewer informations other than the generic "System unexpectedly shutdown" stuff. (If there are other specific codes / keywords I need to search, tell me!)

Things I tried:
- Ensure power cords are properly plugged.
- DDU -> Clean install latest drivers. Tried "driver only" and "full install".
- optimized defaults on motherboard BIOS.
-- BIOS config only has the Kombo thing (which should mean CO -30), a quick and dirty RAM profile, fan curve related stuff, and PCIe ASPM settings (default [Disabled], tried [Auto] and [L0s and L1 states]. Currently [Auto].)
- Running whatever controllable fans at full blast.
- Uninstalled Afterburner and RTSS.
-- I have heard that it helps with crashes in FH5. It didn't help in my case. I will reinstall Afterburner if you guys say so.
- Flipping VBIOS switch on the GPU, and then DDU -> clean install drivers. Currently on Quiet VBIOS. RGBs on card is always off.
- Flipping the PCIe link state settings on Windows.
-- Switching from Moderately power saving to no power savings worsened the situation a lot. Almost all driver timeouts on the "Things that crashed" section can be read as instant reboot.
-- Switching to Maximum power savings doesn't help the situation. It also induces some frame rate instability.
- Forcing a 59.94Hz or 60Hz on all three monitors.
- Power limit -10% on AMD Adreanlin. It can only go that low.
- Disconnecting both 1080p monitors.
- Plugging the PC to a standalone socket instead of a power...strip? tower? thingy. (the power tower thingy is from a local/regional reputable brand. It has lots of sockets. It has the PC, three monitors, a PS4Pro and a XBox360 plugged. Both consoles are off, and the monitors aren't that hungry to begin with, but it's probably worth a try. It didn't work.

Things I plan to won't try
- A previous driver version (23.11.1). Not sure how it fares on Folding. But it crashed on FH5 though.

Assuming the PCIe link state thing is set to Moderate power savings...
Things that the system currently will not crash
- General light, non-gaming usage
- "Lightweight" games (I tried WWE 2K23, TBP ~160W, nothing happens)
- Superposition benchmark
- Port Royal and Speedway stress test alone
- Port Royal and Speedway stress test + a CPU folding workload
-- Note that it finished the test. Of course it will not pass in such state. Also, Port Royal + CPU workload with no power savings may cause a reboot.
EDIT: Folding by GPU only won't crash. Or, it survived for 12hrs so it should be fine.
Furmark + CPU folding is also probably fine (survived 30mins)

Things that the system currently crashes
- Forza Motorsport 8 (4K, everything Ultra w/RT, not frame limited. It should be 30~100FPS depends on tracks.)
-- The lighting will sometimes glitch out to neon-like, which greatly increases the chance to driver timeout. Sometimes the lighting glitch will resolve by running the race as if nothing happened. Sometimes it will not. Crashes are rare, but will occur from outta nowhere anyway.
-- Lowering details and limit to 75FPS will help, but not completely resolve the situation.
- Forza Horizon 5 (4K, everything Ultra, 150FPS limited). Normally crashes 5~30mins after game launch.
-- Lowering details and further limit FPS to 75 will help, but not completely resolve the situation. Also the lighting glitch and "environment disappear" glitch in FM8 also occurs here.
- The Crew Motorfest (4K, everything Ultra, 60FPS limited by game engine). Normally crashes 15min~2hr after game launch.
-- I can't recall it caused a system reboot ever. But it will eventually freeze, throws a driver timeout error + a "GPU lost!" error message.
- CPU + GPU folding workloads. Sometimes it works fine. Sometimes it's crash city. Sometimes it reboots whenever the GPU tries to start a WU. Probably depends on WUs.

Other basic information:
- Temperatures are all fine. ~85C for CPU. ~92C for GPU hotspot and VRAM. Disks and Motherboard are <60C.
- Outdoor temperature was 6C at the lowest. And it was pretty dry. It is also cold AF in my room, but this level of coldness by itself shouldn't cause any issues. Statics might be, but I'm a potato in regards to such knowledge.
- BIOS is at the time latest version.
- GPU drivers is latest (23.12.1). Cannot be sure about other drivers, which probably means it isn't latest.
- I have not touched any GPU settings other than power limit. No overclock.
- I haven't properly tried underclock and undervolt. With such instability I can't be sure what settings are good.
- You may remember me worrying about idle power draw and fiddling with Custom Resolution Utility (CRU). DDU should remove all CRU settings (I need to re-enter the settings on CRU after DDU), and crashes occur without CRU settings.
- I've heard there are many versions of Corsair RM850. Judging from the fonts, it should be 2019, but I vaguely remembers that it existed in my household before it was purchased in March 2019. I will try to find exactly what it is.
- The GPU in question uses two 8-pin cables and have a power limit of 350W in quiet VBIOS. I have used two separate cables, using the middle (EDIT: end; middle plug is not usable due to card overhang/ cable length) plug of the pigtail cables. It is plugged on the bottom right sockets on the PSU side. I might have missed the "correct" sockets to use in this situation.
- I won't have any way to test the 7900XTX on other computer. EDIT: My brother agreed to test with his PC, but he got a slightly more potato but kinda newer PSU. See post #5.

I have thoughts on what the issue is, but before me jumping onto conclusions, I would like to hear what your thoughts are, and if I have missed some other things which might be helpful.

Thanks in advance.

As I have a habit to really make the system quiet, first my thought was "something not in the sensors is too hot". Back when winter wasn't here, it throws some rather unpleasant hot air...but nothing horribly bad. As stated before, pushing all fans to 100% doesn't help.

When I still have the 3070, everytime it frequently reboots it is almost certainly due to the cable not plugged properly. Which drives my attention to the PSU. When I still used the 3070, it only reboots when there is no workload, and when I'm not using the computer. No, it is not Windows Update, but I don't know what it is. It normally happens in weeks, so whatever.
The situation resembles to the dreaded T-word...I mean transient spikes on 3090s / other 7900XTXs. And then I looked harder on HWINFO64.
There's a number, GPU Power Maximum, which I have seen to go as high as 580W. I know this is not a number that can be used in review (it does not state the time period of such maximum; and if such number is usable it devalues a lot of professional reviews), but this number does make me worry about the transients.

I know RM850 is supposed to be a super high quality unit, but I am pretty sure at this point that the PSU ate too much dust / degraded / was not good enough to start with. Attempts to open up the PSU to clean up only leads to the destruction of a "DANGER HIGH VOLTAGE" sticker, which probably also functions as a "Warranty void if broken" sticker. I failed to get inside it, which may be a good thing. I curse my stupidity .

Other possibilities I can think of include "It is actually a bad card" and "LOL AMD drivers".

If it turns into a "which PSU should I buy" thread, just to be ultra safe I will need your advice anyway.
 
Last edited:

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,161 (2.36/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
I think the only RM unit I recall was actually regarded super favourably was various iterations of RMx over the years. Regular RM would have been "decent" and nothing more. But don't quote me on that. 2019 would make the PSU at least 4 years old by now - when people say the 7900XT/XTX is fine on power they mean that if you have a high quality 750W/850W unit you should be fine. Regular RM isn't exactly what comes to mind.

I had a younger but still ~2 years old SF750 (the indisputable king of SFX for a long, long time) that was not up to the task of the 7900XT. Zero issues with crashing later on with a new HX1000. I don't think "transients" (a la RTX 30 and RX6000) tells the whole story, and in any case Navi31 doesn't have ridiculously bad transients. Better just to err on the safe side, for a midrange PSU that's got a few years under its belt.

How much of a delta between edge temp and hotspot on GPU? What is HWInfo regularly reporting for all GPU temps?

I'm not aware of HWInfo's "GPU Power Maximum" actually meaning anything useful. It regularly shows pretty wacky numbers on Navi31 that don't exactly correspond with reality. AMD's GPU telemetry is incredibly complex and in all likelihood it doesn't mean anything relevant to the end user.
 
Joined
Nov 27, 2023
Messages
2,411 (6.45/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent (Solid)
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original) on a X-Raypad Equate Plus V2
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (24H2)
I will echo the above about the PSU. There is a fairly important distinction between RM and RMx. RM were pretty decent units, but not really something that I saw recommended often by respectable sources like Aris. The RMx (2018) and RMx (2021) were/are regarded as some of the best units in their category though. If you can borrow or somehow temporary acquire a better PSU for some testing it might be the play to go. I agree it's not transients (probably) and it's not like the rest of your system, judging by specs, is a particular power hog, but erring on a safer side here is prudent. 7900XTX is nowhere as bad as the 6000 series or Ampere, but still can have nasty spikes.
If after the testing with a different PSU the behaviour continues, then, I am afraid, the "GPU is defective" option might be on the table.
 
Joined
Feb 24, 2023
Messages
3,080 (4.74/day)
Location
Russian Wild West
System Name DLSS / YOLO-PC / FULLRETARD
Processor i5-12400F / 10600KF / C2D E6750
Motherboard Gigabyte B760M DS3H / Z490 Vision D / P5GC-MX/1333
Cooling Laminar RM1 / Gammaxx 400 / 775 Box cooler
Memory 32 GB DDR4-3200 / 16 GB DDR4-3333 / 3 GB DDR2-700
Video Card(s) RX 6700 XT / R9 380 2 GB / 9600 GT
Storage A couple SSDs, m.2 NVMe included / 240 GB CX1 / 500 GB HDD
Display(s) Compit HA2704 / MSi G2712 / non-existent
Case Matrexx 55 / Junkyard special / non-existent
Audio Device(s) Want loud, use headphones. Want quiet, use satellites.
Power Supply Thermaltake 1000 W / Corsair CX650M / non-existent
Mouse Don't disturb, cheese eating in progress...
Keyboard Makes some noise. Probably onto something.
VR HMD I live in real reality and don't need a virtual one.
Software Windows 11 / 10 / 8
If that's an option go grab any overkill PSU in order to test and return it back using 2-week no-question-asked moneyback policy (idk if that's a thing where you live tho).

Testing ends up showing clear signs of running stable, you know what to do.
Still same symptoms, you're most likely having a failing GPU.
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
I think the only RM unit I recall was actually regarded super favourably was various iterations of RMx over the years. Regular RM would have been "decent" and nothing more. But don't quote me on that. 2019 would make the PSU at least 4 years old by now - when people say the 7900XT/XTX is fine on power they mean that if you have a high quality 750W/850W unit you should be fine. Regular RM isn't exactly what comes to mind.

I had a younger but still ~2 years old SF750 (the indisputable king of SFX for a long, long time) that was not up to the task of the 7900XT. Zero issues with crashing later on with a new HX1000. I don't think "transients" (a la RTX 30 and RX6000) tells the whole story, and in any case Navi31 doesn't have ridiculously bad transients. Better just to err on the safe side, for a midrange PSU that's got a few years under its belt.
I'm sure when my brother made the decision to buy the PSUs, it was something in the line of "Look! A-tier in a bargain price! BUYBUYBUYBUY".

Age might have been a thing for my PSU. It has been running near 24/7 since March 2019.

How much of a delta between edge temp and hotspot on GPU? What is HWInfo regularly reporting for all GPU temps?
Not sure what do you mean by all GPU temps, so here are two screencaps of HWINFO, one for folding, one just after crashing in FH5.
Since FH5 crashed in <10mins, I will grab one for a longer session (likely in FM8) if you say so.

fold.png
FH5.png



If that's an option go grab any overkill PSU in order to test and return it back using 2-week no-question-asked moneyback policy (idk if that's a thing where you live tho).
I don't think that's a thing here. I haven't really asked around, but I won't count on that.



Kinda update: My brother agreed to test my 7900XTX this weekend. He is using a very similar system (5800X3D, TUF X570, 3070Ti, one less HDD and one more SSD than mine, a few fans less than mine, and... Corsair RM750). He thought it was RM750x, and we both LOL'd some 2 mins when we found out it was a "regular" RM750.
Both my RM850 and his RM750 was purchased in March 2019. The RM750 was not used for at least a few months, and he doesn't use it 24/7. He also doesn't use it overnight unless he needs space heater. So his unit should be much "younger" than mine.

Considering the models are similar enough, it might worth bringing my RM850 to his place / using his RM750 to test in my system. He didn't agree to test/swap the PSU for now, but the test itself shouldn't be too hard, right...?

I drafted an action plan:
1, Test 7900XTX on brother's system.
1a) if crash then think next step (buy PSU, or potentially find some shop to test)
1b) if not crash then...
2, Test RM850 on brother's system.
2a) if crash then buy PSU
2b) if not crash then...
3, Test RM750 on my system
3a) if crash then (something else is wrong on my side)
3b) if not crash then (keep RM750 or buy PSU)

If there's something I've missed, don't hesitate to point out.
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,161 (2.36/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
I'm sure when my brother made the decision to buy the PSUs, it was something in the line of "Look! A-tier in a bargain price! BUYBUYBUYBUY".

Age might have been a thing for my PSU. It has been running near 24/7 since March 2019.


Not sure what do you mean by all GPU temps, so here are two screencaps of HWINFO, one for folding, one just after crashing in FH5.
Since FH5 crashed in <10mins, I will grab one for a longer session (likely in FM8) if you say so.

View attachment 327097View attachment 327098



I don't think that's a thing here. I haven't really asked around, but I won't count on that.



Kinda update: My brother agreed to test my 7900XTX this weekend. He is using a very similar system (5800X3D, TUF X570, 3070Ti, one less HDD and one more SSD than mine, a few fans less than mine, and... Corsair RM750). He thought it was RM750x, and we both LOL'd some 2 mins when we found out it was a "regular" RM750.
Both my RM850 and his RM750 was purchased in March 2019. The RM750 was not used for at least a few months, and he doesn't use it 24/7. He also doesn't use it overnight unless he needs space heater. So his unit should be much "younger" than mine.

Considering the models are similar enough, it might worth bringing my RM850 to his place / using his RM750 to test in my system. He didn't agree to test/swap the PSU for now, but the test itself shouldn't be too hard, right...?

I drafted an action plan:
1, Test 7900XTX on brother's system.
1a) if crash then think next step (buy PSU, or potentially find some shop to test)
1b) if not crash then...
2, Test RM850 on brother's system.
2a) if crash then buy PSU
2b) if not crash then...
3, Test RM750 on my system
3a) if crash then (something else is wrong on my side)
3b) if not crash then (keep RM750 or buy PSU)

If there's something I've missed, don't hesitate to point out.

Yeah, see, 90C+ is not hotspot, it's mem temp. 15 degrees between GPU edge and hotspot is pretty normal to see on any GPU. Mem temp of 90 is also reasonable, 20Gbps GDDR6 runs hot.

Borrowing the RM750 is something, but considering that it's an even lower wattage rating I'm not sure it's worth testing even if it's been used less. It's the same fare and same age. His system spec doesn't mean a lot, I also used to run 5900X/5800X3D and 3070 Ti on the SF750 and I'm pretty sure that setup could have continued to run for a decade. As soon as the 7900XT came into the picture, shit hit the fan.

If you can make use of a return policy anywhere, or if a shop can lend you/test themselves with a higher capacity unit, just test with a new PSU.
 
Last edited:
Joined
Sep 3, 2019
Messages
3,529 (1.84/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -6-14, +50MHz (up to 5.0GHz)
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3667MT/s 1.42V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.10.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v24H2, OSBuild 26100.2161), upgraded from Win10 to Win11 on Jan 2024
I'm also new to the RX7000 space. Recently (few days ago) got a Nitro+ 7900XTX and my system can be viewed on my specs.
I have a Corsair HX750i (80+ titanium) PSU Unit that runs PCs 24/7 from 2018.

Adrenalin settings
GPU TBP is set to +15%, 401.5W (quiet BIOS) according to HWiNFO64 (v7.68) (GPU PPT Short 401.5W/ Sustained 334.6W)
Max Clock to 3000MHz and core voltage to 1060mV (1150mV default)
I have the GPU fan custom curve rather aggressive, goes up to 2400~2500rpm (65~67%) with avg around 1900~2000rpm, and keeping temps low enough
GPU Temp: <60C
GPU Hotspot: <80C
Mem Junction: <=80C
Memory Clock to 2564MHz (2500MHz default) and HWiNFO reports a max of 2551MHz
(Ambient airflow temp to the card is around 22~23C)

I haven't really test it thoroughly enough and on many games but on a couple so far + some regular benches (no furry things!!) its doing pretty well.
Max GPU core clocks are 3000+MHz (Max Effective<2900MHz), max FrontEnd clock 3000~3100+MHz
TBP is going up to 398~399W (HWiNFO) and max GPU PowerMaximum is between 430~490W
Max GPU core voltage (VDDCR_GFX) is up to 1010mV

Thankfully the PSU is a digital (i) one (connects to mainboard with a small cable on internal usb header) and I can see some info on HWiNFO.
Max PSU Power out is below 600W (560~590W depending the load/game/bench) with an avg between 400~500W depending again on type of load.
The card has a 3x8pin connectors and I'm using 2xPCI-E cables from PSU (each cable has 2x8pin)

I will keep testing it, try different settings and games and also keep an eye on this thread.
I will report back at some point.
 
Last edited:
Joined
Jan 1, 2012
Messages
345 (0.07/day)
This seems sort of similar to the Black screen to restart issues I started having after I upgraded to a 7800 XT, but I'll be cautious to say it's the same thing as there's genuinely a lot of things that can cause this. I can't help but notice this seems more common on Radeon GPUs lately though (especially more recent Navi 2 and Navi 3 ones), but it's certainly not exclusive to them.

You mention there's no BSODs and nothing in Event Viewer so I'm mostly posting to ask this in case it helps. I had a similar result (but I did have Event ID 18 in Event Viewer, which was the CPU throwing a machine check exception but not being specific about it). Is anything showing up in the Windows/LiveKernelReports/WHEA or Windows/LiveKernelReports/WATCHDOG folders? They will be logs and WinDbg will be able to view them. It was these logs that aided me in my own issues. The WHEA logs basically had 0x124 entries which were rather generalized like the Even ID 18 logs, but the latter (the Watch Dog entries) were more specific and pointed to the GPU or drivers ever time. And the behavior started after changing the video card, so I went that route (after first exhausting almost everything else).

I RMA'd my GPU and my own episode is still ongoing based on if that ends up resolving it or not, but as someone else who went down this "Black screen to reboot" issue that seems common on modern AMD GPUs for whatever reason, skip the "I'll rule everything else out first" mistake that I made in order to be through and proper, and trust your gut and save your time. It's nice to be thorough but when something changes, and a behavior comes with it, the first suspect should be the thing that changed. It might be something else, but logically it makes sense to start with the most likely in my opinion. I just wanted to rule out everything else because I like making things harder on myself because I wanted to ensure there weren't other issues with my system before RMAing a possibly good 7800 XT. Time will tell if the replacement solves it but the fact that Sapphire replaced it indicates they found an issue with it?

This almost sounds like a GPU/power issue like mine though.
 
Joined
Oct 15, 2011
Messages
2,443 (0.51/day)
Location
Springfield, Vermont
System Name KHR-1
Processor Ryzen 9 5900X
Motherboard ASRock B550 PG Velocita (UEFI-BIOS P3.40)
Memory 32 GB G.Skill RipJawsV F4-3200C16D-32GVR
Video Card(s) Sapphire Nitro+ Radeon RX 6750 XT
Storage Western Digital Black SN850 1 TB NVMe SSD
Display(s) Alienware AW3423DWF OLED-ASRock PG27Q15R2A (backup)
Case Corsair 275R
Audio Device(s) Technics SA-EX140 receiver with Polk VT60 speakers
Power Supply eVGA Supernova G3 750W
Mouse Logitech G Pro (Hero)
Software Windows 11 Pro x64 23H2
Looks like the PSU has voltage drops! (if no error message logged other than that there was no normal shutdown) Looks like PSU replacement time!
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
Yeah, see, 90C+ is not hotspot, it's mem temp. 15 degrees between GPU edge and hotspot is pretty normal to see on any GPU. Mem temp of 90 is also reasonable, 20Gbps GDDR6 runs hot.

Borrowing the RM750 is something, but considering that it's an even lower wattage rating I'm not sure it's worth testing even if it's been used less. It's the same fare and same age. His system spec doesn't mean a lot, I also used to run 5900X/5800X3D and 3070 Ti on the SF750 and I'm pretty sure that setup could have continued to run for a decade. As soon as the 7900XT came into the picture, shit hit the fan.

If you can make use of a return policy anywhere, or if a shop can lend you/test themselves with a higher capacity unit, just test with a new PSU.
*whispers* well hotspot was max 94C in the first screenshot. On a hotter day both memory and hotspot will get close to 100C, though we know they are fine temp wise...
Well at least my brother's PC is kinda better than nothing.
I wanted to ask a specific local shop (just because it is owned by a university classmate), but their working time matches too well / badly with mine, and I'm not in a rush to fix the issues. Will update on the "find a local shop to test" route later.

Looks like the PSU has voltage drops! (if no error message logged other than that there was no normal shutdown) Looks like PSU replacement time!
HWINFO didn't record a voltage drop after a driver timeout crash (minimum on 12V is 12V flat on motherboard side, see second screenshot on post #5), though I'm guessing in that direction.

Over these two days I have found a few more cases over the Internet, some pointing to potato PSU (even big PSU), some bad cards. Some RM750s are fine, some 1200W go bad. This is going to be a fun, long, drawn out thread...

This seems sort of similar to the Black screen to restart issues I started having after I upgraded to a 7800 XT, but I'll be cautious to say it's the same thing as there's genuinely a lot of things that can cause this. I can't help but notice this seems more common on Radeon GPUs lately though (especially more recent Navi 2 and Navi 3 ones), but it's certainly not exclusive to them.

You mention there's no BSODs and nothing in Event Viewer so I'm mostly posting to ask this in case it helps. I had a similar result (but I did have Event ID 18 in Event Viewer, which was the CPU throwing a machine check exception but not being specific about it). Is anything showing up in the Windows/LiveKernelReports/WHEA or Windows/LiveKernelReports/WATCHDOG folders? They will be logs and WinDbg will be able to view them. It was these logs that aided me in my own issues. The WHEA logs basically had 0x124 entries which were rather generalized like the Even ID 18 logs, but the latter (the Watch Dog entries) were more specific and pointed to the GPU or drivers ever time. And the behavior started after changing the video card, so I went that route (after first exhausting almost everything else).

I RMA'd my GPU and my own episode is still ongoing based on if that ends up resolving it or not, but as someone else who went down this "Black screen to reboot" issue that seems common on modern AMD GPUs for whatever reason, skip the "I'll rule everything else out first" mistake that I made in order to be through and proper, and trust your gut and save your time. It's nice to be thorough but when something changes, and a behavior comes with it, the first suspect should be the thing that changed. It might be something else, but logically it makes sense to start with the most likely in my opinion. I just wanted to rule out everything else because I like making things harder on myself because I wanted to ensure there weren't other issues with my system before RMAing a possibly good 7800 XT. Time will tell if the replacement solves it but the fact that Sapphire replaced it indicates they found an issue with it?

This almost sounds like a GPU/power issue like mine though.
That thread is on my watch list. I read that again and found a few things to add/ forgot to mention...

- The weather has gone warm now and the crashes are seemingly less frequent. Still happening though.

- If I have touched any GPU settings in Adrenalin and a driver timeout / crash reboot occured, it will reset to default settings. It crashed frequent enough that I didn't bother to change anything after crashes. It crashes on default settings anyway.

- if CPU+GPU folding doesn't instantly cause a reboot, CPU+GPU folding + WWE2K23 (a lightweight game but still cause actual load) has a much higher chance to cause a crash (both type is possible). A GPU accelerated Chromium running a browser game may also cause very frequent driver timeout. I thought this is a unrealistic workload so i left out in the initial post. Crashing folding@home doesn't feel good...

- My reboot / driver timeout issue is a bit different than yours. To be more specific:
-- If it is a driver timeout that doesn't cause a reboot, graphics on all monitor freezes, audio doesn't freeze. After ~10~20 seconds the game executable ends (The Crew Motorfest doesn't end there; it will throw an error message "GPU lost" before completely close). Other non-GPU-accelerated applications are not crashed.
-- If it is a reboot, it happens instantly. No freeze. My speakers always make noises on every boot, and the noises are emitted instantly after I push the power button / the system starts its normal reboot procedure, so I know the moment the screens go black, it is rebooting. Don't worry, the noise is not disgusting to start with. And it kinda a known issue listed in the manual (apparently common for USB speakers), but for this case I see this as a feature.

- No event ID 18 or 19 at least on the most recent driver timeout and reboot crashes.

- There is only 1 file in Windows/LiveKernelReports/WHEA and it is not related to the timeframe I encountered all the crashes.

- There are quite a few .dmp files in Windows/LiveKernelReports/WATCHDOG. It matches the time I encounter driver timeout crashes. It's normal working hours here so the best I can do for now is check if the files exists. I will see if there's anything comes out of the crashdumps later.

By the way, how's your RMA going? Is it not yet delivered?
 
Joined
Sep 25, 2023
Messages
413 (0.94/day)
Location
Detroit, Michigan
System Name Desktop/HTPC
Processor Ryzen 7 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi ITX
Cooling ID Cooling DashFlow 240mm AIO
Memory 64GB (2x32) G.Skill Trident Z RGB DDR4-3600 CL18
Video Card(s) Sapphire Pulse Radeon RX 7900 XTX 24GB
Storage 4TB WD SN850X NVMe, 2TB WD SN850X NVMe
Display(s) LG OLED65B9PUA 65" 4K OLED TV
Case Lian Li x Dan A4-H20
Audio Device(s) USB to MiniDSP DDRC24 DAC, RCA to SUMO Andromeda amp, wired to Wharfedale SP88 speakers
Power Supply Silverstone SX1000R SFX-L Platinum ATX 3.0
Mouse Corsair M65 RGB Elite
Keyboard Corsair K65 RGB Mini
VR HMD Oculus Rift / PSVR / PSVR2 / Lenovo Explorer
Software Windows 11 Pro
I'll bet you just about anything it's the PSU - older non ATX 3.0 units don't seem to be able to handle the transient power spikes of the 7900XTX. I went from an 850w (only about a year old, good unit) that had zero issues with my system running an RTX 2080 to a 1000w ATX 3.0 unit and all the stability issues I was having with Forza Motorsport 5 disappeared.
 
Joined
Sep 3, 2019
Messages
3,529 (1.84/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -6-14, +50MHz (up to 5.0GHz)
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3667MT/s 1.42V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.10.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v24H2, OSBuild 26100.2161), upgraded from Win10 to Win11 on Jan 2024
HWiNFO (sensors status window) can show if there are any WHEA errors, if the PC doesn't restart upon crash.
So you dont have to go into event viewer every time.

- If I have touched any GPU settings in Adrenalin and a driver timeout / crash reboot occured, it will reset to default settings. It crashed frequent enough that I didn't bother to change anything after crashes. It crashes on default settings anyway.
Did you try to undervolt at all, or put a limit on core clock? I'm not saying to run like this if you find stability. Just to narrow down the culprit, by seeing maybe something on GPU parameters on HWiNFO.
Maybe start with a 3000MHz limit or even lower, and then proceed to cut down 20~30mV.
If issue is even reduced in frequency then its most likely a power feed issue.

I'll bet you just about anything it's the PSU - older non ATX 3.0 units don't seem to be able to handle the transient power spikes of the 7900XTX. I went from an 850w (only about a year old, good unit) that had zero issues with my system running an RTX 2080 to a 1000w ATX 3.0 unit and all the stability issues I was having with Forza Motorsport 5 disappeared.
Maybe you have a point there but my 750W PSU is ATX v2.4 that already has 5.5+years of 24/7 function and its doing pretty well with the Nitro+ 7900XTX on 401W limit, so far on few games/benchmarks.
Thing is though that I've read many times about problems with Corsair RM units throughout the years, that suppose to be TierA. Not all of them are bad, as many people have them and work just fine.
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
Here's a WinDbg on my last driver timeout crash.
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

VIDEO_ENGINE_TIMEOUT_DETECTED (141)
One of the display engines failed to respond in timely fashion.
(This code can never be used for a real BugCheck; it is used to identify live dumps.)
Arguments:
Arg1: ffff948196c43050, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8074891b210, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, The secondary driver specific bucketing key.
Arg4: 0000000000004120, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image amdkmdag.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for amdkmdag.sys

KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 2233

Key : Analysis.Elapsed.mSec
Value: 5864

Key : Analysis.IO.Other.Mb
Value: 3

Key : Analysis.IO.Read.Mb
Value: 13

Key : Analysis.IO.Write.Mb
Value: 41

Key : Analysis.Init.CPU.mSec
Value: 515

Key : Analysis.Init.Elapsed.mSec
Value: 63077

Key : Analysis.Memory.CommitPeak.Mb
Value: 97

Key : Bugcheck.Code.LegacyAPI
Value: 0x141

Key : Dump.Attributes.AsUlong
Value: 18

Key : Dump.Attributes.KernelGeneratedTriageDump
Value: 1

Key : Failure.Bucket
Value: LKD_0x141_IMAGE_amdkmdag.sys

Key : Failure.Hash
Value: {48b738dd-5a92-7ff8-63d0-f075fc680fe0}


BUGCHECK_CODE: 141

BUGCHECK_P1: ffff948196c43050

BUGCHECK_P2: fffff8074891b210

BUGCHECK_P3: 0

BUGCHECK_P4: 4120

FILE_IN_CAB: WATCHDOG-20231228-2322.dmp

DUMP_FILE_ATTRIBUTES: 0x18
Kernel Generated Triage Dump
Live Generated Dump

TAG_NOT_DEFINED_202b: *** Unknown TAG in analysis list 202b


VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffff948196c43050
Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.

PROCESS_OBJECT: 0000000000004120

PROCESS_NAME: System

STACK_TEXT:
fffff487`7bb5f5d0 fffff807`31490ea0 : ffff9481`96c43050 fffff487`7bb5f840 ffff9481`96c43050 ffff9481`81e8a760 : watchdog!WdpDbgCaptureTriageDump+0x64a
fffff487`7bb5f680 fffff807`313360d9 : 00000000`00000003 00000000`00000000 ffff9481`7cee5000 ffff9481`00000001 : watchdog!WdDbgReportRecreate+0xd0
fffff487`7bb5f6e0 fffff807`335a1179 : ffff9481`00000000 ffff9481`851da500 ffff9481`96c43050 ffff9481`7cfa1000 : dxgkrnl!TdrUpdateDbgReport+0x119
fffff487`7bb5f740 fffff807`33640e35 : ffff9481`7cfa1000 00000000`00000000 ffff9481`7cf13000 ffff9481`7cfa1001 : dxgmms2!VidSchiResetEngine+0x709
fffff487`7bb5f8f0 fffff807`336159fd : ffff9481`7cf13000 00000000`00000001 00000000`00000000 00000000`00000000 : dxgmms2!VidSchiResetEngines+0xb1
fffff487`7bb5f940 fffff807`335ef9f2 : fffff487`7bb5fa01 00000000`0008a3f1 00000000`00989680 00000000`00000001 : dxgmms2!VidSchiCheckHwProgress+0x25fdd
fffff487`7bb5f9b0 fffff807`3357b70a : ffff9481`87002bf0 ffff9481`7cf13000 fffff487`7bb5fad9 00000000`00989680 : dxgmms2!VidSchiWaitForSchedulerEvents+0x372
fffff487`7bb5fa80 fffff807`335fde85 : ffff9481`91279000 ffff9481`7cf13000 ffff9481`91279010 ffff9481`80990010 : dxgmms2!VidSchiScheduleCommandToRun+0x2ca
fffff487`7bb5fb40 fffff807`335fde3a : ffff9481`7cf13400 fffff807`335fdd70 ffff9481`7cf13000 ffffd880`eb8c0100 : dxgmms2!VidSchiRun_PriorityTable+0x35
fffff487`7bb5fb90 fffff807`163078e5 : ffff9481`7a643500 fffff807`00000001 ffff9481`7cf13000 000fe067`bcbbbdff : dxgmms2!VidSchiWorkerThread+0xca
fffff487`7bb5fbd0 fffff807`164064b8 : ffffd880`eb8c0180 ffff9481`7a643500 fffff807`16307890 00000000`00000000 : nt!PspSystemThreadStartup+0x55
fffff487`7bb5fc20 00000000`00000000 : fffff487`7bb60000 fffff487`7bb59000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME: amdkmdag+fb210

MODULE_NAME: amdkmdag

IMAGE_NAME: amdkmdag.sys

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: LKD_0x141_IMAGE_amdkmdag.sys

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {48b738dd-5a92-7ff8-63d0-f075fc680fe0}

Followup: MachineOwner
---------


Did you try to undervolt at all, or put a limit on core clock? I'm not saying to run like this if you find stability. Just to narrow down the culprit, by seeing maybe something on GPU parameters on HWiNFO.
Maybe start with a 3000MHz limit or even lower, and then proceed to cut down 20~30mV.
If issue is even reduced in frequency then its most likely a power feed issue.
Previous attempt to undervolt didn't end well. I was copying some settings from Youtube (Ancient Gaming channel), but seemingly my card really didn't like undervolt.
My card at its current state (quiet VBIOS) has a default max clock of 3035MHz.
I tried max clock 2750, core voltage 1125mV, it didn't crash, but FH5 environment disappear glitched in record time.
Core voltage back to stock, and it was behaving well for 10mins, but environment disappeared again.
I would assume this is not a known bug, and both 7900XTX and FH5 is old enough that the glitch should mean some instability.

Update:
I will take the card to my brother's place to check a few hours later.
The shop agreed to check my PC tomorrow (I can postpone to next Saturday if I want to). Because of the nature of the issue, it should take a few days. They also offered an MSI MPG A1000G PCIE5 if they think the PSU is the culprit. (They probably would test it, but this is my imagination.)

Update2: On my brother’s system, we only managed to crash once (FH5, 1440p Extreme preset, no frame limit, driver timeout). We tried hard to crash for the 2nd time and failed, but 1 crash is enough here.
He is fascinated by the performance of the card, but is also super annoyed by coil whine? of the card. (I didn’t notice the whine when I first got my hands on the card. I sure can notice it now, but it is very tolerable to my ears. He is super sensitive to that kind of noise though, and he let me have a Powercolor 5700XT Red Dragon a few years ago just because he couldn’t stand the coil whine.)

He thinks that sending the PC to a shop is a waste of time, as normal stress tests like Port Royal and Furmark are not ‘reliable’ to force a crash, they may just ¯\_(ツ)_/¯ and take my money. If a new PSU is needed anyway we should just buy it. He may also need a new PSU for a planned purchase of 4070TiSuper, but that’s another story. I also saw some reviews claiming the MSI A1000G is a bit noisy, and I probably would have been more annoyed by that.

With that out of the way, I will ask around other local shops to see what PSU models of 1000-1200W are available. I will do an Update 3 about 14hrs later, which will be a “what PSU should I buy” thing. Price is probably not a big issue here, because safety margins. Sunk cost fallacy
 
Last edited:
Joined
Jan 1, 2012
Messages
345 (0.07/day)
Yeah, your issue is definitely a bit different than mine since you're having driver crashes and not machine check exception restarts. That's why I was cautious to say it was the "same thing" but I wanted to see if any of those steps I took still proved useful and namely if you were getting logs where you weren't aware of them.

It it's restarting without any logs, such as no BSOD, no event viewer logs (besides Event ID 41/6008 which are symptoms of the restart and not causes), no memory dumps, and no other logs, I'd think power related. This would almost always be PSU but it could be motherboard. I had a very awkward sudden power loss issue a year or two back and it matched up to the letter with a faulty PSU... except for the fact that it started after I updated the BIOS, and went away when I reverted to the old BIOS. Still never conclusively figured that one out as newer BIOS didn't do it and the PSU and motherboard both ended up RMA'd for separate issues, but the motherboard also had issues POSTing on cold boots later. Rare exceptions aside, I'd presume PSU 90% of the time but maybe motherboard or GPU (GPU has power managing stuff on its end too and you are getting driver crashes so that's at least a sign).

That bug check is one of the four I would get.

Coil whine is annoying! I would agree with your brother, haha. I had to give up on the EVGA GTX 970 back in the day over it. My current RX 7800 XT (both of the two) has zero coil whine so I'm thrilled.

And my RMA came in right before Christmas but my time to test things has been sort of low until recently so there hasn't much to conclusively add yet (some small things on their own but nothing pertaining to if the prime issue is resolved or not). I have had two instances of formal driver crashes and recovering without restarts, which I never had before, so far limited to one game (one that way back in the day was said by other players to be problematic on AMD video hardware I guess?). Things have been mixed for me with this so far I admit. Only maybe important formal thing to mention there is Sapphire sent a replacement so I take that as a sign they found something wrong with what I sent in? They didn't say what they found; only that it was a replacement being returned. Or they found nothing wrong and were just as lost as me and gave me a new one anyway as a "let's see if this fixes it anyway" attempt. Not sure. So far things are going well but it's only been about a week and I've sort of been treating it timidly (avoiding the things I knew before might increase the crash chances), and it took a while for the issue to happen regularly before so this is one I'm going to need a couple/few months of no issues to feel confident in saying "the problem was that particular 7800 XT and RMA fixed it". For now, fingers crossed. Good luck with yours, as well. I'd be thinking PSU or GPU (in no particular order) in your case if you've exhausted stuff like different drivers, Windows reinstall, and all that.
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
Things that I forgot :
- Here’s the crash message on my brother’s system, with a driver timeout at the same time:
The instruction at 0x00007FF76696FEC referenced memory at 0x0000000000000000. The memory could not be written.
-My brother didn’t let me swap PSU. But that is probably enough testing. He runs a relatively fresh win11 pro.
- He thinks His RM750 was bought slightly later, and my RM850 is probably not 2019 version. I still think it is 2019 version, but whatever.
Yeah, your issue is definitely a bit different than mine since you're having driver crashes and not machine check exception restarts. That's why I was cautious to say it was the "same thing" but I wanted to see if any of those steps I took still proved useful and namely if you were getting logs where you weren't aware of them.

It it's restarting without any logs, such as no BSOD, no event viewer logs (besides Event ID 41/6008 which are symptoms of the restart and not causes), no memory dumps, and no other logs, I'd think power related. This would almost always be PSU but it could be motherboard. I had a very awkward sudden power loss issue a year or two back and it matched up to the letter with a faulty PSU... except for the fact that it started after I updated the BIOS, and went away when I reverted to the old BIOS. Still never conclusively figured that one out as newer BIOS didn't do it and the PSU and motherboard both ended up RMA'd for separate issues, but the motherboard also had issues POSTing on cold boots later. Rare exceptions aside, I'd presume PSU 90% of the time but maybe motherboard or GPU (GPU has power managing stuff on its end too and you are getting driver crashes so that's at least a sign).

That bug check is one of the four I would get.

Coil whine is annoying! I would agree with your brother, haha. I had to give up on the EVGA GTX 970 back in the day over it. My current RX 7800 XT (both of the two) has zero coil whine so I'm thrilled.

And my RMA came in right before Christmas but my time to test things has been sort of low until recently so there hasn't much to conclusively add yet (some small things on their own but nothing pertaining to if the prime issue is resolved or not). I have had two instances of formal driver crashes and recovering without restarts, which I never had before, so far limited to one game (one that way back in the day was said by other players to be problematic on AMD video hardware I guess?). Things have been mixed for me with this so far I admit. Only maybe important formal thing to mention there is Sapphire sent a replacement so I take that as a sign they found something wrong with what I sent in? They didn't say what they found; only that it was a replacement being returned. Or they found nothing wrong and were just as lost as me and gave me a new one anyway as a "let's see if this fixes it anyway" attempt. Not sure. So far things are going well but it's only been about a week and I've sort of been treating it timidly (avoiding the things I knew before might increase the crash chances), and it took a while for the issue to happen regularly before so this is one I'm going to need a couple/few months of no issues to feel confident in saying "the problem was that particular 7800 XT and RMA fixed it". For now, fingers crossed. Good luck with yours, as well. I'd be thinking PSU or GPU (in no particular order) in your case if you've exhausted stuff like different drivers, Windows reinstall, and all that.
My windows installation is pretty old, but Looking again at my brother’s system, I think I have done enough to rule out everything other than PSU and GPU.

You probably should go hard on your system to force a crash because after all you did all the process to troubleshoot the crashes. But hey, real life first. Take your time and good luck!



Here’s the ‘which should I buy’ portion. There’s no ‘X day return window’ policy here.
All models should be 80gold and fully modular, unless specified or I forgot to mark here. ATX3.0 thingy is a bit murky, so I failed to mark .
All prices are in local currency. Roughly $8 here is USD$1.

Shop A
Corsair RM1000e $1250
RM1000x $1380 (SHIFT version is same price)
ROG Thor 1000W (Platinum) $2600
(I didn’t jot down the prices of RM1200x, it should be ~$2000.)

Shop S
CoolerMaster V1300 (Platinum) 30th Anniversary $1999
CoolerMaster GX1250 $1699
*They said they are doing a promotion for these models. They do not have 1000W models, partly because the promotions are priced too close to high end 1000W models.

Shop D
Gigabyte P1000GM $1399
Gigabyte UD1300 $1699
TUF 1000W $1550
TUF 1200W $1750
ROG Loki 1000W (Platinum) $1959
NZXT C1000 $1299
NZXT C1200 $1699
Seasonic Prime PX1000 (Platinum) $1999
Seasonic Focus GX1000 $1499
Seasonic Vertex GX1000 $1699
Seasonic Vertex GX1200 $1999
Silverstone Decathlon DA1000 (semi-modular) $899

Also there is the MSI MPG A1000G, but that’s probably too noisy.

Personally no Gigabyte ( because hand grenades, good job GN and sorry GB), and otherwise no brand preferences. I would have heavily leaned on be quiet! Dark power if there is one because fanboy reasons, but no luck.
Price is not an issue (just a ‘if both are same performance…’ thing. Also sunk cost fallacy safety margins.)
PSU must be good enough to not be the issue here.
Noise performance is also important, although Zero RPM is not needed as long as it is quiet enough.
No RGB needed because solid panel. If there is RGB, there must be option to disable it.

So…out of these choices, which would you choose?

Edit: After looking around for reviews, now favoring Seasonic Prime PX1000 (because Seasonic Prime).
CoolerMaster V1300 would have been the choice if it wasn’t so biiig. (Length 192mm, clearance to HDD bays 195mm

There may be Seasonic Titanium units, but price is getting out of hand.

Decision made. Picked Seasonic Vertex GX1200.
 
Last edited:
Joined
Jan 1, 2012
Messages
345 (0.07/day)
- Here’s the crash message on my brother’s system, with a driver timeout at the same time:
The instruction at 0x00007FF76696FEC referenced memory at 0x0000000000000000. The memory could not be written.
That's possibly an unrelated and separate issue. I think that particular type of message usually indicates a software issue more than a hardware one, because it means something tried to access memory it shouldn't have been able to have permission to access (and it may have been doing that due to software issues), but some software issues do occur precisely as a cascaded results of hardware issues so its hard to say here.

If you're not also seeing these errors on your PC then I'd sort of presume that one might be something else. You might be complicating your own already complicated issues with something entirely different.

I wish I could help you on the PSU but I've been a bit out of the loop on those myself for a long time, and I even preemptively made a thread asking for advice on it in case my video card RMA doesn't prove successful in removing the issue. I think I'd be looking at ATX 3.0 at this point, and the 850W versions of the Seasonic Vertex and Be Quiet Dark Power 13 both had my eye (both are a bit expensive and probably overkill versus what I'd need but I was wanting to rule out all doubt). That's not a recommendation for either of those two mind you; just what I was thinking in my own situation. Ask faye what they went to since it seemed to resolve it so if you have an issue that is down to the PSU, then it should have a good chance of working for you.
You probably should go hard on your system to force a crash because after all you did all the process to troubleshoot the crashes. But hey, real life first. Take your time and good luck!
Logically, yes, I should be trying to encourage it to happen to see if it's gone, but I admit I've not been doing that lately since I'm so exhausted from it all so I've "just been using it normally" for now. Still, that too should eventually lead to the behavior occurring if it is present, so one way or another I'll find out sooner rather than later. If I end up encountering it again, I'll be trying a different PSU myself next.
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
That's possibly an unrelated and separate issue. I think that particular type of message usually indicates a software issue more than a hardware one, because it means something tried to access memory it shouldn't have been able to have permission to access (and it may have been doing that due to software issues), but some software issues do occur precisely as a cascaded results of hardware issues so its hard to say here.

If you're not also seeing these errors on your PC then I'd sort of presume that one might be something else. You might be complicating your own already complicated issues with something entirely different.
There are times on my side which throws a similar error, but normally it is only the driver timeout message.
But his PSU is more potato than mine, and he is on Win11, so who knows...

I'm on the same ship on the PSU side, but starts at 1000W. Aiming for utter overkill at reasonable price, if there is such a thing.

I got overloaded by reviews and found hwbusters.com sometimes answers to comments.
So I dropped a very simplified version there and got a reply from Aris. He picked Seasonic Vertex GX1200, so I'm gonna buy that. Fingers crossed.
That thing is 'borderline Platinum', so that's a plus.
Would love to have a Seasonic Prime, but there is no 1000W or 1200W that can do ATX3.x.
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,045 (1.99/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
Go for that Meg PSU, if it doesn't resolve the issues, I'd recommend returning the card and getting a Super. PCIE 5.0/ATX 3.0 PSU is good to have anyway.

As others said DDU, clean install and all that if you want to pursue the software side of things.
 
Joined
Nov 25, 2019
Messages
8 (0.00/day)
Location
Southern California
Here are my thoughts on this:

For reference the ATX spec uses PS_On, the Nuvoton chip PSON#. I will use PSON# for this post.

This is somewhat of a longshot but this *could* be a PSON# incompatibility between the power supply and motherboard. The symptom is the computer re-boots without a blue screen. If there is an error log it will be kernel power event ID 41.

Per the HWInfo pngs you posted your motherboard uses the Nuvoton NCT6797D LPC/eSPI SI/O chip. I could not find a spec for that chip on-line so I will use the datasheet for the NCT6796D.

The Nuvoton chip is specified to be able to pull the PSON# to less than <= 0.40 volts while sinking 12mA. Per the ATX Power Supply spec the PSON# signal should be less than 0.8 V to turn on. The maximum current at 0.4 volts is -1.6 mA (- sign meaning the current is sourced from the power supply). So, it seems all should be good.

However, there could be some other issues:

The NTC6797D could have a resistor on the motherboard to protect it from the outside world.

The power supply may not meet the ATX Power Supply spec. I have tested several that don’t.

There also is a “ground loop” involved. The NCT6797D chip is pulling to DC common on the motherboard: The power supply supervisor chip is connected to DC common in the power supply and there is a voltage drop in the wires between the two.

I would try to minimize the resistance in the DC common leads between the power supply and the motherboard. You have already checked the connectors to ensure they are plugged in all the way. You may want to look closely at the contacts and what you can see of the crimps. Try to keep this to a minimum as some contacts have a durability rating of less than 100 cycles. (The true Molex Mini Fit Jr contacts will last way more than specified). There is a picture online of damage to the contacts on the 24-pin connector due to testing with a paper clip. (It was fixed by the user by bending the contacts).

If you happen to have a voltmeter and /or an oscilloscope you might want to measure the voltage drop in the DC common between your power supply and your motherboard (close to the NCT6797D chip). Also the voltage on the PSON# pin on the 24-pin connector. I would recommend putting the negative lead of your meter on an unloaded connector from your power supply such as an unused peripheral connector and connect while the PSU is off. Keep in mind the 5Vsb can take several minutes to go to zero after the power supply is de-energized.

Here are some more tests you may want to try with you power supply disconnected from your system:

If you happen to have a 249-ohm resistor (1% standard value) you could test the PSON# of your power supply. For example, the spec calls for <= 1.6mA at 0.4 volts. This comes out to a resistance of 250 ohms between DC common and PSON# the end of the 24-pin connector. (0.40/250=1.6mA) so the voltage should be <= 0.40 volts under this condition. If you have a resistor that is close you can scale the voltage and current.

If you have access to a 1K pot: Connect it in place of the resistor while set to maximum and adjust down until the power supply turns on. The voltage should be >= to 0.80. Then adjust the pot until the voltage is 0.40 volts. Then disconnect the pot and measure its resistance. It should be 250 or more ohms. You could also test for a hysteresis of 0.3 volts (see spec).

If you have any additional questions, post on this thread. Keep in mind I am on California time.

Good luck,

Timothy P. McGrath

Former Senior Engineer

PC Power & Colling. Inc
 
Joined
Sep 3, 2019
Messages
3,529 (1.84/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -6-14, +50MHz (up to 5.0GHz)
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3667MT/s 1.42V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.10.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v24H2, OSBuild 26100.2161), upgraded from Win10 to Win11 on Jan 2024
Timothy P. McGrath

Former Senior Engineer

PC Power & Colling. Inc
Had a "PC Power & Cooling" 400W (I think or maybe a 450W) PSU many years ago. Excellent unit and reliable for way over 5years of serious usage.
Only changed it for higher wattage one at some point.

Have many years to see them available in my local market.
 

Toothless

Tech, Games, and TPU!
Supporter
Joined
Mar 26, 2014
Messages
9,615 (2.46/day)
Location
Washington, USA
System Name Veral
Processor 7800x3D
Motherboard x670e Asus Crosshair Hero
Cooling Corsair H150i RGB Elite
Memory 2x24 Klevv Cras V RGB
Video Card(s) Powercolor 7900XTX Red Devil
Storage Crucial P5 Plus 1TB, Samsung 980 1TB, Teamgroup MP34 4TB
Display(s) Acer Nitro XZ342CK Pbmiiphx, 2x AOC 2425W, AOC I1601FWUX
Case Fractal Design Meshify Lite 2
Audio Device(s) Blue Yeti + SteelSeries Arctis 5 / Samsung HW-T550
Power Supply Corsair HX850
Mouse Corsair Nightsword
Keyboard Corsair K55
VR HMD HP Reverb G2
Software Windows 11 Professional
Benchmark Scores PEBCAK
It's most likely PSU. Having a quality unit that can handle these hungry cards helps a ton. Had the same issue with my 6800XT at the time between an EVGA and Corsair power supplies.

Don't get why you were told to return the card and get an NVIDIA card. Screams fanboy.
 
Joined
Jun 25, 2020
Messages
152 (0.09/day)
System Name The New, Improved, Vicious, Stable, Silent Gaming Space Heater
Processor Ryzen 7 5800X3D
Motherboard MSI B450 Tomahawk Max
Cooling be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back)
Memory Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40
Video Card(s) PowerColor RX7900XTX HellHound
Storage ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA
Display(s) Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical
Case Phanteks P600S
Audio Device(s) Creative Katana V2X gaming soundbar
Power Supply Seasonic Vertex GX-1200 (ATX3.0 compliant)
Mouse Razer Deathadder V3 wired
Keyboard Keychron Q6Max
Go for that Meg PSU, if it doesn't resolve the issues, I'd recommend returning the card and getting a Super. PCIE 5.0/ATX 3.0 PSU is good to have anyway.

As others said DDU, clean install and all that if you want to pursue the software side of things.
I think I have done enough DDU -> clean install to rule out software side of things.
And I was already on my way to pick up the Seasonic unit. (The MSI MAG is kinda noisy, so no.)
As for Super…without getting too hard into flames, let’s just say I’m firmly on side AMD for this gen.
Refund is a very rare thing on my side of the planet. Return + refund is not gonna happen. RMA is doable. Resell is too, but I would like to try everything to make this work first.



Now I have done the PSU swap. The Seasonic Vertex GX1200 is up and running.
It’s only an hour into testing, and it looks promising…

I will take a long hard look on TimatPSUTest’s comment later. No equipment to test and not enough knowledge to fully understand, but very interesting.
 
Joined
Feb 18, 2005
Messages
5,847 (0.81/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Razer Pro Type Ultra
Software Windows 10 Professional x64
Don't get why you were told to return the card and get an NVIDIA card. Screams fanboy.
Maybe because an NVIDIA card doesn't require you to additionally shell out for an entire new PSU?
 
Joined
Nov 27, 2023
Messages
2,411 (6.45/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent (Solid)
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original) on a X-Raypad Equate Plus V2
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (24H2)
Maybe because an NVIDIA card doesn't require you to additionally shell out for an entire new PSU?
True nowadays, but Ampere was wonky and I still have PTSD from first gen Fermi. I distinctly remember replacing the decent, but woefully inadequate PSU in my friend’s rig to run his brand shiny new 480. A whole whopping 600 watts one was bought! Wild for the time.
Funny how those things were considered power hogs back then, and now cards pull like three times as much in some cases and we just go on with our lives. To be fair, thermals were more of an issue with Fermi than power.
 

Toothless

Tech, Games, and TPU!
Supporter
Joined
Mar 26, 2014
Messages
9,615 (2.46/day)
Location
Washington, USA
System Name Veral
Processor 7800x3D
Motherboard x670e Asus Crosshair Hero
Cooling Corsair H150i RGB Elite
Memory 2x24 Klevv Cras V RGB
Video Card(s) Powercolor 7900XTX Red Devil
Storage Crucial P5 Plus 1TB, Samsung 980 1TB, Teamgroup MP34 4TB
Display(s) Acer Nitro XZ342CK Pbmiiphx, 2x AOC 2425W, AOC I1601FWUX
Case Fractal Design Meshify Lite 2
Audio Device(s) Blue Yeti + SteelSeries Arctis 5 / Samsung HW-T550
Power Supply Corsair HX850
Mouse Corsair Nightsword
Keyboard Corsair K55
VR HMD HP Reverb G2
Software Windows 11 Professional
Benchmark Scores PEBCAK
Maybe because an NVIDIA card doesn't require you to additionally shell out for an entire new PSU?
Actually, that's the case for the 3080 in the other rig. It doesn't matter if the unit isn't good enough quality, high enough wattage, needed connectors etc etc both brands can require a new PSU. This isn't an AMD-only problem.
 
Top