- Joined
- Jun 25, 2020
- Messages
- 152 (0.09/day)
System Name | The New, Improved, Vicious, Stable, Silent Gaming Space Heater |
---|---|
Processor | Ryzen 7 5800X3D |
Motherboard | MSI B450 Tomahawk Max |
Cooling | be quiet! DRP4 (w/ added SilentWings3), 4x Noctua A14x25G2 (3 @ front, 1 @ back) |
Memory | Teamgroup DDR4 3600 16GBx2 @18-22-22-22-42 -> 18-20-20-20-40 |
Video Card(s) | PowerColor RX7900XTX HellHound |
Storage | ADATA SX8200Pro 1TB, Crucial P3+ 4TB (w/riser, @Gen2x4), Seagate 3+1TB HDD, Micron 5300 7.68TB SATA |
Display(s) | Gigabyte M27U @4K150Hz, AOC 24G2 @1080p100Hz(Max144Hz) vertical, ASUS VP228H@1080p60Hz vertical |
Case | Phanteks P600S |
Audio Device(s) | Creative Katana V2X gaming soundbar |
Power Supply | Seasonic Vertex GX-1200 (ATX3.0 compliant) |
Mouse | Razer Deathadder V3 wired |
Keyboard | Keychron Q6Max |
I can see the NEW PSU!! argument, and maybe 4080 will be kinder to my PSUs, but it is what it is...
Update: Flipped the PCIe link state switch to No Power savings to stress more on the PSU. Situations that otherwise will instant reboot are now fine.
It is ridiculously hot here for December (I guess ambient is 20C~22C), and on specific scenes VRAM have reached 100C even after I bumped up GPU fan to from ~30% to ~55%. Probably still a fine temperature, but for the bad part of the following, take this into account.
Also, due to messy cable management, I have fears that I bent the 8pin cables too hard. (they are not "folded" , just mashed quite a bit, to be clear.)
CPU+GPU folding and CPU folding + Port Royal stress test are fine now. Previously it is possible to cause instant reboot.
FM8 encountered two environment disappear glitch in quick succession after ~30mins. It is a known issue so I will let it go. And then there is a lighting glitch after a ~30min idling on race preparation menu (the one with track already loaded and rendered). This is probably not acknowledged yet, but I have seen rare occurrence on other NVIDIA users (IIRC it's a 4060), so I will also let it go.
In general, glitches and crashes appeared much less frequently.
Here is the hard part: FH5 is the only game I have encountered any driver timeout. Apparently after all these years FH5 still doesn't like a lot of things , and among them are Afterburner + RTSS (removed before starting this thread), potentially Steam's FPS counter, and some. I have some browser games running background, and the known issue page says it likes fresh boot, there are numerous places that it is gonna crash, and Steam version may have memory leak, so while those issues are nearly nonexistent when I use the 3070, the following bad results can be excused as "AMD drivers LOL" or "Come on devs".
*These tests are done earlier, so the link state thing is flipped to Moderate Power Savings" for this session. I will retest FH5 days later, and all bad things happened in FH5 can be excused.
Not fresh boot, crashed after ~45mins, and then a bigger crash in <10mins (one with memory cannot be read error, and broke explorer.exe). At this time I noticed one 8pin was not fully plugged on GPU side.
Fresh boot, idled on free roam for 1hr, monitor turned off (so I will let this one go), environmental disappear glitch. Restart game, survived ~30mins before a driver timeout.
* Again, for now, I'm gonna excuse these bad results as "AMD drivers LOL" or "Come on devs". Whatever crashes are already much less frequent than before.
It was at this moment, I realize both FM8 and FH5, the two games I play the most, is not the best example to test GPU stability. Especially FH5.
If there are other signs that the card is actually bad (most likely another otherwise stable game that I can nearly consistently crash with no excuses) I will try to RMA the card.
I had a feeling that this part is slightly off topic and ruins readability of this thread, so I stuck this part in a spoilerbox.
There are lots of crashes that didn't cause a reboot, but the new PSU largely (if not completely) fixes the crashes and reboots. There are enough evidence that my old PSU is not good enough.
I can see the explanations are about instant reboots. There are Kernel power event ID 41 here for the reboots I had.
I have to look for what PS_ON actually means. Here is what wikipedia says: PS-ON Signal is a pin on a 20-pin or 24-pin ATX-specified power connector used turn on/off a personal computer power supply unit. It turns on the power supply when it is switched from high to low and turns off the power supply when switched from low to high, or open-circuited.
To my untrained eyes my 24pin ATX cable on the old PSU looks completely fine (no harsh bends, no metals shown on the cable side), but I have once loosened that by accident and caused a very steep voltage drop and all sorts of BSODs.
Now my English reading comprehension / high-school level physics and electronics knowledge failed me. I'm trying to understand what you say here. Also guessing from what you suggested to test, my summary is: there are a few possibilities of what actually happened (or should I say what you think actually happened) in an instant reboot event: (I'm also guessing these are supposed to be in point form.)
- The NTC6797D could have a resistor... -> completely not sure what's happening here, guessing weirdness in voltages.
- The power supply may not meet the ATX Power Supply spec -> Voltages in the PSON# pin from PSU side is not correct.
- Ground loop -> In a transient event, due to the ground loop, voltage drop leads to current towards PSON# on the chip having higher voltage than it should have been, which leads to a power-off event. PSON# go normal again, which leads to a power-on.
I don't have the equipments to test, and my stupid hands are gonna mess up and potentially damage stuff if I proceed to test, and my brother have plans to sell the PSU with my 3070 as a bundle offer to close friends, so the truth will be forever hidden. Whatever the expression should be.
But these are very plausible and interesting explanations and exercises to learn more about PSUs. Thank you very very much for your time looking into my case here.
Update: Flipped the PCIe link state switch to No Power savings to stress more on the PSU. Situations that otherwise will instant reboot are now fine.
It is ridiculously hot here for December (I guess ambient is 20C~22C), and on specific scenes VRAM have reached 100C even after I bumped up GPU fan to from ~30% to ~55%. Probably still a fine temperature, but for the bad part of the following, take this into account.
Also, due to messy cable management, I have fears that I bent the 8pin cables too hard. (they are not "folded" , just mashed quite a bit, to be clear.)
CPU+GPU folding and CPU folding + Port Royal stress test are fine now. Previously it is possible to cause instant reboot.
FM8 encountered two environment disappear glitch in quick succession after ~30mins. It is a known issue so I will let it go. And then there is a lighting glitch after a ~30min idling on race preparation menu (the one with track already loaded and rendered). This is probably not acknowledged yet, but I have seen rare occurrence on other NVIDIA users (IIRC it's a 4060), so I will also let it go.
In general, glitches and crashes appeared much less frequently.
Here is the hard part: FH5 is the only game I have encountered any driver timeout. Apparently after all these years FH5 still doesn't like a lot of things , and among them are Afterburner + RTSS (removed before starting this thread), potentially Steam's FPS counter, and some. I have some browser games running background, and the known issue page says it likes fresh boot, there are numerous places that it is gonna crash, and Steam version may have memory leak, so while those issues are nearly nonexistent when I use the 3070, the following bad results can be excused as "AMD drivers LOL" or "Come on devs".
*These tests are done earlier, so the link state thing is flipped to Moderate Power Savings" for this session. I will retest FH5 days later, and all bad things happened in FH5 can be excused.
Not fresh boot, crashed after ~45mins, and then a bigger crash in <10mins (one with memory cannot be read error, and broke explorer.exe). At this time I noticed one 8pin was not fully plugged on GPU side.
Fresh boot, idled on free roam for 1hr, monitor turned off (so I will let this one go), environmental disappear glitch. Restart game, survived ~30mins before a driver timeout.
* Again, for now, I'm gonna excuse these bad results as "AMD drivers LOL" or "Come on devs". Whatever crashes are already much less frequent than before.
It was at this moment, I realize both FM8 and FH5, the two games I play the most, is not the best example to test GPU stability. Especially FH5.
If there are other signs that the card is actually bad (most likely another otherwise stable game that I can nearly consistently crash with no excuses) I will try to RMA the card.
I had a feeling that this part is slightly off topic and ruins readability of this thread, so I stuck this part in a spoilerbox.
Here are my thoughts on this:
For reference the ATX spec uses PS_On, the Nuvoton chip PSON#. I will use PSON# for this post.
This is somewhat of a longshot but this *could* be a PSON# incompatibility between the power supply and motherboard. The symptom is the computer re-boots without a blue screen. If there is an error log it will be kernel power event ID 41.
There are lots of crashes that didn't cause a reboot, but the new PSU largely (if not completely) fixes the crashes and reboots. There are enough evidence that my old PSU is not good enough.
I can see the explanations are about instant reboots. There are Kernel power event ID 41 here for the reboots I had.
I have to look for what PS_ON actually means. Here is what wikipedia says: PS-ON Signal is a pin on a 20-pin or 24-pin ATX-specified power connector used turn on/off a personal computer power supply unit. It turns on the power supply when it is switched from high to low and turns off the power supply when switched from low to high, or open-circuited.
To my untrained eyes my 24pin ATX cable on the old PSU looks completely fine (no harsh bends, no metals shown on the cable side), but I have once loosened that by accident and caused a very steep voltage drop and all sorts of BSODs.
However, there could be some other issues:
The NTC6797D could have a resistor on the motherboard to protect it from the outside world.
The power supply may not meet the ATX Power Supply spec. I have tested several that don’t.
There also is a “ground loop” involved. The NCT6797D chip is pulling to DC common on the motherboard: The power supply supervisor chip is connected to DC common in the power supply and there is a voltage drop in the wires between the two.
I would try to minimize the resistance in the DC common leads between the power supply and the motherboard. You have already checked the connectors to ensure they are plugged in all the way. You may want to look closely at the contacts and what you can see of the crimps. Try to keep this to a minimum as some contacts have a durability rating of less than 100 cycles. (The true Molex Mini Fit Jr contacts will last way more than specified). There is a picture online of damage to the contacts on the 24-pin connector due to testing with a paper clip. (It was fixed by the user by bending the contacts).
If you happen to have a voltmeter and /or an oscilloscope you might want to measure the voltage drop in the DC common between your power supply and your motherboard (close to the NCT6797D chip). Also the voltage on the PSON# pin on the 24-pin connector. I would recommend putting the negative lead of your meter on an unloaded connector from your power supply such as an unused peripheral connector and connect while the PSU is off. Keep in mind the 5Vsb can take several minutes to go to zero after the power supply is de-energized.
Here are some more tests you may want to try with you power supply disconnected from your system:
If you happen to have a 249-ohm resistor (1% standard value) you could test the PSON# of your power supply. For example, the spec calls for <= 1.6mA at 0.4 volts. This comes out to a resistance of 250 ohms between DC common and PSON# the end of the 24-pin connector. (0.40/250=1.6mA) so the voltage should be <= 0.40 volts under this condition. If you have a resistor that is close you can scale the voltage and current.
If you have access to a 1K pot: Connect it in place of the resistor while set to maximum and adjust down until the power supply turns on. The voltage should be >= to 0.80. Then adjust the pot until the voltage is 0.40 volts. Then disconnect the pot and measure its resistance. It should be 250 or more ohms. You could also test for a hysteresis of 0.3 volts (see spec).
Now my English reading comprehension / high-school level physics and electronics knowledge failed me. I'm trying to understand what you say here. Also guessing from what you suggested to test, my summary is: there are a few possibilities of what actually happened (or should I say what you think actually happened) in an instant reboot event: (I'm also guessing these are supposed to be in point form.)
- The NTC6797D could have a resistor... -> completely not sure what's happening here, guessing weirdness in voltages.
- The power supply may not meet the ATX Power Supply spec -> Voltages in the PSON# pin from PSU side is not correct.
- Ground loop -> In a transient event, due to the ground loop, voltage drop leads to current towards PSON# on the chip having higher voltage than it should have been, which leads to a power-off event. PSON# go normal again, which leads to a power-on.
I don't have the equipments to test, and my stupid hands are gonna mess up and potentially damage stuff if I proceed to test, and my brother have plans to sell the PSU with my 3070 as a bundle offer to close friends, so the truth will be forever hidden. Whatever the expression should be.
But these are very plausible and interesting explanations and exercises to learn more about PSUs. Thank you very very much for your time looking into my case here.