Weird memory timings with Ryzen 5 5600X + freezing issues?

The King · Nov 24, 2023

cst1992 said:
What if there are no dumps?

In a sense these are the cleanest crashes I have seen - they leave no droppings.

Is there nothing in the windows event viewer has well? Any critical errors logged there?

cst1992 · Nov 24, 2023

There are, but they don't seem related. Some are related to BitLocker, some to DCOM, some to some other things.

I hope I don't jinx this thing yet again, but the disabling of the PBO Fmax Enhancer setting definitely did something. There have been no crashes all day now, especially in the worst areas - in the middle of a gaming session, or while waking up from sleep mode, or if left idle for a few hours.

The new kit also seems to have taken care of the stuttering issues, but I'll have to use the build for a month or so to really bring out the stuttering issue - it gets worse over time.

For now I'm seeing a ray of hope.

VuurVOS · Nov 24, 2023

cst1992 said:
There are, but they don't seem related. Some are related to BitLocker, some to DCOM, some to some other things.

I hope I don't jinx this thing yet again, but the disabling of the PBO Fmax Enhancer setting definitely did something. There have been no crashes all day now, especially in the worst areas - in the middle of a gaming session, or while waking up from sleep mode, or if left idle for a few hours.

The new kit also seems to have taken care of the stuttering issues, but I'll have to use the build for a month or so to really bring out the stuttering issue - it gets worse over time.

For now I'm seeing a ray of hope.

Did you enable PBO Fmax Enhancer or is it enabled by default? If it was enabled by default, then corecycler should have reported issues with core stability.

cst1992 · Nov 25, 2023

VuurVOS said:
Did you enable PBO Fmax Enhancer or is it enabled by default? If it was enabled by default, then corecycler should have reported issues with core stability.

Asus enabled it by default with Auto :banghead:

I agree about the report, but didn't you say to run it for 72 hours? It crashed at I think the 14-hour mark.

An0maly_76 · Nov 25, 2023

cst1992 said:
Thank you everyone for your support.

The MX500 is barely used. It was in a drawer for a year (as I bought it in a sale) then when I built the system I installed it as a high-speed storage drive.
Well, to be more accurate it wasn't completely unused, I used it to store some of the excess files from my laptop once or twice in the said year.
I am not getting any values that concern me in SMART. If any of the two SSDs is bad, how do I tell? Heat is not an issue as both slots for NVMe drives on this motherboard come with a heatsink on them.
I could try creating a partition on the MX500 and installing Windows on that. Tried so much so far, what's one more? But in that case, should I just remove the NVMe drive from the board?

But in this, you're clarifying it has very little use and you know its history. That out of the way...

cst1992 said:
The individual Corsair modules booted fine, except if used at 3600MHz in the B1 slot.

So at least I shouldn't be getting a RAM error when booting with the new kit, even if I tried to use individual modules. Still no guarantees about the B1 slot (why try it if it is not recommended?), but a normal boot should always happen. Yes, now using a new kit. DOCP timings are fine - you can check the ZenTimings screenshots above.

Keep in mind that Ryzens use an integrated memory controller, so if one channel flakes out, it may not matter what RAM is installed in the slot for that channel, it will still have problems. That's the reason for the suggestion of trying B1 instead of A1.

I would test this by using only one DIMM of the QVL'd RAM in slot A1. If that solves your problem, then that would point to the CPU's memory controller. If it does not, try the same DIMM in slot B1. If that solves your problem, that would also point to the CPU's memory controller. If neither slot solves the problem with that DIMM, repeat this with the other DIMM. You could still have one flaky DIMM, whether the CPU has a memory controller issue or not.

cst1992 said:
I'm pretty sure as I bought a sealed box. As of Install 2 I am using version 4802 of the BIOS, so in theory it should support the 5600X3D as well.

While chances are good it was indeed brand-new in a sealed box, it's not necessarily guaranteed. There have been scams run, some through major retailers, some even by employees with an accomplice.

Scam 1 - Someone buys new hardware. They also score a cheaper one elsewhere on eBay, etc. They then find a way to swap the hardware without breaking the seal on the box, then return the cheaper one. Next person who buys it is getting a used unit with a serial number already registered that will void any warranty. The scammer gets a brand-new unit for the eBay price with a full warranty, as they would have registered it before returning the other product in its box.

Scam 2 - Accomplice buys new hardware from specific employee. They, too, find a way to swap the hardware without breaking the seal on the box, then depending on how creative they get, might be returning a dead product or even an incorrect product. Employee is in on it, and it might not be caught until months later. Here's a video on how this happened to someone in your situation, who bought a Ryzen from overseas through a family member. So you may want to verify in the BIOS messages / CPU identification that it is in fact a 5600X and not maybe some other AM4 CPU.

cst1992 said:
Yes, now using a new kit. No. New RAM is in the QVL.

It's not returnable. https://www.amazon.in/gp/product/B088KSRW4S
In any case I was thinking of keeping this as it's a kit (unlike the Corsairs) and on the QVL.

I would. At this stage, it's the one thing that has a guarantee of not being the problem.

cst1992 said:
ZenTimings doesn't show a voltage value for the new kit either. This is technically a BIOS issue, but I am on the latest BIOS as of the start of this month (new version is for fixing the Inception vulnerability). Could this instead indicate a motherboard issue?

It's not out of the realm of possibility. However, the reason I mentioned BIOS is that there have been a few 'oopsies' with beta BIOS (that some motherboards have shipped with). One of them removed Zen 2 support, resulting in a system not booting with a brand-new motherboard.

cst1992 said:
If the issue is nothing else, could it be the CPU? I ran corecycler as recommended by @VuurVOS but it froze before it could go past the 12 hour mark. It didn't report any issues in said 12 hours.

As mentioned earlier, it's certainly possible if there's an issue with the IMC. Also, even if PBO was enabled by default with Auto, you should be able to disable that.

cst1992 said:
Also, if anyone has any idea what HDD Power Saving is and how to disable it, let me know.

Power saving features for HDD and CPU seem to be a root of many issues in these new motherboards, it seems.

It's possible... Search Power Options from the Windows search... from there...

cst1992 · Nov 25, 2023

If there is a way of swapping the chip without breaking the outer seal, I am not aware of it. I certainly couldn't do it.

I bought a brand-new unit directly from a retailer I've known for years, and they bought it from the distributor.

An0maly_76 · Nov 25, 2023

cst1992 said:
If there is a way of swapping the chip without breaking the outer seal, I am not aware of it. I certainly couldn't do it.

I bought a brand-new unit directly from a retailer I've known for years, and they bought it from the distributor.

As I said, it's remote, but still a possibility. Likely the bottom of the box, or someone got extremely creative in other ways. I'm not saying the retailer is culpable in that, could be an unscrupulous previous customer. At this stage in the game, either way, I'm more of the mind the CPU may have a memory controller issue. Also remote, but possible given the circumstances.

Also, I forgot to mention in the power settings, you want to select 'Change plan settings' to the right of the active plan.

oobymach · Nov 25, 2023

Random crashes without BSOD could be power related. You may or may not have covered this already, this thread is 7 tabs long I'm not going through it all, just adding my 2 cents.

Test high power draw to try and trigger it, run a demanding cpu and gpu test at the same time, cinebench cpu test and superposition gpu benchmark. IF this triggers a restart your culprit is likely psu.

Disable PBO before running test, cinebench is going to stress the cpu hard so leave it on auto with PBO disabled to make sure it's not the cpu causing the crash.

VuurVOS · Nov 25, 2023

oobymach said:
Random crashes without BSOD could be power related. You may or may not have covered this already, this thread is 7 tabs long I'm not going through it all, just adding my 2 cents.

Happens in idle and had this problem also. 99% software related. Question which software causing it....

cst1992 · Nov 25, 2023

I never had PBO on with the crashing happening in the first place.
Resetting a PBO-related setting to a hard "off" only led to the system not freezing, that's all.

One thing though - is high-frequency crashes a thing? When running all-core Cinebench, my CPU used to go 4650MHz on all cores assuming the power limit allowed it. Now with that setting disabled it doesn't boost above 4100. Only in all-core though - single core still boosts fine.

oobymach · Nov 25, 2023

cst1992 said:
I never had PBO on with the crashing happening in the first place.
Resetting a PBO-related setting to a hard "off" only led to the system not freezing, that's all.

One thing though - is high-frequency crashes a thing? When running all-core Cinebench, my CPU used to go 4650MHz on all cores assuming the power limit allowed it. Now with that setting disabled it doesn't boost above 4100. Only in all-core though - single core still boosts fine.

If PBO off makes your system not freeze then it's your issue, if I'm understanding that right that turning it off led to stability.

I had this issue with GTAV or some game causing reboot with PBO enabled but would run fine on a lesser curve (4.55ghz instead of 4.65ghz with the 5600x I was using at the time). Only one game triggered it but it was fine for everything else and reducing the curve led to stability in the game.

Not all chips are 4.65ghz capable all the time (voltage curve can get you close though), pbo just boosts a couple cores to that frequency, but that was causing my issue with my chip so it may help with yours.

Cinebench won't crash a cpu with PBO (or at least it shouldn't) it's just a very stressful test and some people use that awful heatsink that comes with the cpu, I was just suggesting it as a test to run in parallel with a gpu test to try and trigger a reset.

Also I've had issues running Firefox in conjunction with Steam and Paint, my system will hard freeze and not respond to keyboard or windows commands, so if it's software triggering it then it could be a combination of a + b + c = freeze.

VuurVOS · Nov 25, 2023

oobymach said:
If PBO off makes your system not freeze then it's your issue, if I'm understanding that right that turning it off led to stability.

I had this issue with GTAV or some game causing reboot with PBO enabled but would run fine on a lesser curve (4.55ghz instead of 4.65ghz with the 5600x I was using at the time). Only one game triggered it but it was fine for everything else and reducing the curve led to stability in the game.

Not all chips are 4.65ghz capable all the time (voltage curve can get you close though), pbo just boosts a couple cores to that frequency, but that was causing my issue with my chip so it may help with yours.

Cinebench won't crash a cpu with PBO (or at least it shouldn't) it's just a very stressful test and some people use that awful heatsink that comes with the cpu, I was just suggesting it as a test to run in parallel with a gpu test to try and trigger a reset.

Also I've had issues running Firefox in conjunction with Steam and Paint, my system will hard freeze and not respond to keyboard or windows commands, so if it's software triggering it then it could be a combination of a + b + c = freeze.

If you have doubts about your PBO config, use corecycler to locate the unstable core. If a core is really unstable, the same core will always crash within 24 hours.
Then you can do a couple of things to make the core more stable like increasing the LLC setting or adjust the curve or decrease max CPU Boost Clock Override.

cst1992 · Nov 25, 2023

I think people are still misunderstanding what I said - I don't use PBO at all. My CPU configuration is completely stock - at least as far as the motherboard is concerned.

Sometimes I use Ryzen Master to boost the PPT, EDC and TDC values a bit to get more juice out of the CPU if I am doing video conversion or something and I can afford to do so as I use a Noctua NH-U12S chromax.black cooler which is rated for 140W.

But this has absolutely nothing to do with my freezes as it has been off all the time the freezes were happening.

All I did was go into Asus' PBO section in the motherboard settings and turn PBO Fmax Enhancer off, because I was told that it causes issues with Zen 3 CPUs.

Ketxxx · Nov 25, 2023

This one's easy your problem is you bought Corsair memory - it's garbage. All of their kits with the exception of the most expensive (and even those I have doubts) are so tightly binned they barely make spec. On top of that Corsair are infamous for constantly switching memory ICs and shoving them out the door under the same model number but look on the sticker on the back of the module heatsink and you'll see a revision number, different revision = 99% chance each kit is using different ICs. Buy literally any other brand of memory and you'll have a lot fewer headaches. Manufacturers I tend to stick with are Klevv, Mushkin, G.Skill, Teamgroup or Patriot. To a lesser extent Kingston and Crucial but I've had a number of issues with them as well, still nowhere near the amount as Corsair though.

Bare in mind that manufacturers also support different modules on different banks - this isn't the old days when you could shove your modules in bank A or B and they would work the same, often now modules have specific tuning done for them on a specific bank pair, whether that is pair A or B is down to the user to find out seldom does the manufacturer include a table in the manual or even on the product page with that information.

Since CPUs have had IMCs I've not known them to behave in the manner seen here (I've skimmed mind so might have missed something) typically if the problem is the IMC almost any kind of load will make a system hard lock or BSOD. The fact this isn't the case for you suggests memory as the problem. I have however known things like this to happen if TIM or other kinds of debris has somehow snuck it's way into the CPU socket so if yoou've taken the CPU out recently I'd say inspecting the socket wouldn't hurt.

cst1992 · Nov 25, 2023

You are WAYYYY behind the times.

AleXXX666 · Nov 25, 2023

cst1992 said:
I downloaded 3 versions of the new BIOS - 4408, 4602 and 4802.

Normally I'd just update to 4802 and be done with it, but I have heard that some versions (4602 or 4802, I'm not sure) have issues. Something about the V2PI version 1.2.0.A not working right or something.

This is my first Ryzen system, so I'm just a bit apprehensive. Plus if I brick the board, I won't get a replacement as the chipset is now going out of stock everywhere.

you won't brick a board if you won't turn off pc during flashing or there won't be a power failure lol

cst1992 · Nov 25, 2023

It froze again, but a bit weirdly this time.

Some background info: I have my laptop and desktop on the same table, so I don't have space for an additional monitor. I use my laptop screen as monitor using a HDMI-to-USB capture card.

When it froze this time, a message popped up on my laptop's Steam client saying the desktop was no longer available, meaning it at least crashed some processes running in memory, including Steam. My Bluetooth keyboard and mouse stopped working immediately, but the wired keyboard I keep plugged in continued to work till I plugged it out and back in about 10 minutes later. Meaning the system was actually still accessible. When I tried to connect a wired mouse, it wouldn't work.

Some errors that popped up in Event Viewer:
Timestamps:
1:08 AM: the time the Bluetooth devices stopped working. I even made this post at 1:08: https://www.techpowerup.com/forums/threads/vendor-specific-issues-are-the-worst.316080/post-5147146
1:21 AM: I hit the Reset button
1:22 AM: I logged in again. There was no internet connection for a few minutes but then it connected, but the Bluetooth still didn't come on. Device manager shows the device is not working correctly.

Why would it think the system shut down at 12:57?

I was also running corecycler at the time. The attached log file shows corruption after the 1:10 AM mark, while running iteration 24.

Reinstalling the drivers for the Bluetooth adapter didn't work. Got another of the same message:

Looks like the adapter has actually failed this time and this is not just a driver issue.

Should I just get a new motherboard?

Ketxxx · Nov 25, 2023

cst1992 said:
You are WAYYYY behind the times.

Well, you could always condense everything thats been tried into a single post to save me like an hours read

VuurVOS · Nov 25, 2023

cst1992 said:
I was also running corecycler at the time. The attached log file shows corruption after the 1:10 AM mark, while running iteration 24.

I dont see a crashed core during the tests. Did you already do a clean Windows install with bare minimum software installed as suggested on Thursday?

cst1992 · Nov 25, 2023

Haven't done that yet, since the change in the Fmax Enhancer setting has led to more stability for a while.

I'll do a reinstall tomorrow.

Ketxxx · Nov 26, 2023

Right, well, I went ahead and just read 7 odd pages meticulously absorbing everything. Boy, is this thread all over the place but I’ll articulate my thoughts and suggestions as best as possible. I’ll cover everything for completeness and an easy jump-in point. If anything is recently done then just skip to the next one I’m mainly doing things this way for anyone else who has this kind of issue and is googling for solutions.

1. Disconnect all system drives except for the OS boot drive, if the drive is a 2.5” SSD ensure the drive is connected to the SATA 1 port. It’s also worth using a different SATA cable to rule out there being issues with the cable itself. While you’re at it remove the SATA power cable, make sure there’s no dust\debris in or on the connection and reseat it. If the drive is already on SATA 1, move it to a different port such as 3.

Regardless of if the drive is a 2.5” SATA or M.2 SATA\NVMe drive download and run Crystal Disk Info, what is the temperature and health status of the drive? If it’s below 60% I’d be replacing that drive. Many drives also have tools available from the manufacturer that you can install which is also worth doing to see if there is an updated firmware available for the drive. Older SSDs particularly might need a firmware update on newer systems.

2. Asus have a new firmware, 5003. It updates the AGESA which usually brings improvements and fixes for memory so while in your case it likely won’t help, it can’t hurt to update it either. No need to leave the system for 30 mins when clearing the CMOS, just remove the power cord from the PSU and the CR2032 battery, hit the power button 3-4 times to discharge the capacitors, leave a few more seconds if you want then power up.

3. Enter the mainboard firmware utility and enable the memory XMP profile, there should be 2, one for 3000/3200, the other rated 3600. Depends on the memory manufacturer here some just program the same SPD values to both profiles and really lazy manufacturers don’t even bother programming the second profile at all so you’ll only have one. Also manually set DRAM voltage to 1.375v\1.38v, or whatever the closest to that is you can set this is to allow for some voltage fluctuation to make sure you’re always getting a minimum 1.35v.

4. Make sure any setting Asus has to fiddle with settings automatically is disabled. You just want the XMP profile active here with manually set DRAM voltage.

I also see you are running a 1T command rate, this could be the root of your problems. Enable GDM (Gear Down Mode) instead of leaving it at “Auto”. Alternatively for testing purposes you can just run a flat 2T CM by manually setting the CM to 2T. All the boards I use scatter this setting slightly differently so you might have to go digging in the AMD OC options if Asus haven’t put it in the AI Tweaker section. Usually to force 2T you’ll also need to set GDM mode to “Disabled”.

5. 650w even for a good PSU is pretty weak for a system that has the spec of yours, have you used anything like HWinfo64 to monitor how much power is being drawn by the system when it crashes\hard locks? I’d also take the opportunity to measure temperatures and voltages as well with HWinfo64 to make sure there aren’t any issues there. As for why your PSU, or a PSU in general, would start to develop issues, same as anything. It's made from multiple parts that just begin to fail through normal wear and tear, and in your case the PSU is getting up there in age the capacitors in it could well be starting to bulge at this point. Yes, even solid state caps bulge eventually.

6. Download and run HCI memtest LINK, if you have memory errors HCI will find them in quick order. Programs like Memtest86, 64, or TestMem5 simply aren’t good enough. I have a saying here, “If you’re HCI Memtest and BOINC MilkyWay@Home stable, you’re golden.” I’ve literally never had stability problems with any system that has passed a 24hr test of HCI and BOINC.

I’d also forget about OCCT, most unreliable software out there you can use to diagnose issues IMO that software has reported problems on completely stock systems I’ve used it with what I know are absolutely rock solid stable.

7. Kind of moot this as you have a kit of Vipers now which is what I would have recommended to switch to, well, the 18-20-20-40 kit anyway as it’s the cheapest not completely crap kit you can get, but run each memory module individually across all of the DIMM banks, while testing for memory errors with HCI Memtest. I’d run at least 3 hours in each bank while testing for errors, anything less than that is no test at all.

8. Short of removing the heatsinks the Corsair modules for what it’s worth at least look like they are using the same ICs but Thaiphoon Burner can’t identify them properly so there is no guarantee the ICs are absolutely identical Corsair could have switched to a newer revision from one batch to the next, something that might actually be indicated by the minor serial number difference.

Also don’t take a manufacturers QVL list as gospel. The QVL is just a list of modules that have been tested with the board on a specific firmware revision during development – still no guarantees that a newer firmware revision wouldn’t introduce problems with a kit that has previously been verified “OK” on an older firmware. Nor does the QVL take into account things like board revision\hardware changes prior to or during production, nor does a QVL usually take into account a memory manufacturer having to change what ICs they use because what they were using either becomes unavailable, obsolete, too expensive, or simply superseded with a newer IC revision. It’s more reliable to look at the types of memory ICs a board has been tested with and pick a module\kit that you know use one of those ICs. I’ve done that for years, never had a problem since I started doing that.

9. Lower FCLK, if stability improves it’s an indicator you could have a bum IMC.

10. The fact you initially didn’t have issues suggests (but doesn’t guarantee) that the fault has developed over time. Behaviour indicative and typical of a memory stick going bad, IMC issues, or as unlikely as it is, bad mainboard trace.

11. Third party antivirus hasn’t been needed since Windows 8 really, but MS Defender didn’t really start holding its own until W10. Either way, forget about your 3rd party anti-virus. Not needed. These days it’ll hinder more than it helps.

12. Create a system restore point to roll back to if needed for a safety net, or download and use some backup software like Paragon Backup & Recovery or AOMEI Backupper.

13. Failing all of that, it’s time to wipe the disk clean and do a completely fresh Windows install and grab the latest drivers directly from the manufacturer website (read: don’t rely on WU or the motherboard manufacturer website go directly to the manufacturers website for each piece of hardware in the system. Eg; nvidia, Realtek, AMD, etc.) Keep the installation barebones, no additional software except for diagnostic tools and whatever WUs Windows wants to install but only enable your net connection once you are done installing the latest hardware drivers so Windows can't screw with them... at least not in the immediate future.

14. I’d also check all connections and components are seated correctly, better to be certain.

Let me know the results, I’m going to bet the Viper kit at the very least is a fair bit more compatible than the Corsair kit. You’ve got a really shitty kit of memory if the system refuses to POST at stock settings (usually 2666 default) in one set of DIMM banks. The only outlier is your wifi card, are you talking about what your board comes with or another one you have installed? If the latter, move it to a different PCI-e slot or better yet remove it and the driver for it completely and test.

Nothing, and I mean nothing IT related, I've failed to get to the bottom of in 25+ years, I'll be damned if your problem defeats me

cst1992 · Nov 26, 2023

Question: if a system is up for close to 24 hours, then fails with things like USB failure, Bluetooth failure, etc what is that indicative of? Because now that the system has had a "good night's sleep" all issues have resolved again. USB connects fine, Bluetooth adapter is no longer "failing in an undetermined manner". Even Event Viewer is not showing any problematic logs. It's like yesterday never happened!

Ketxxx said:
what is the temperature and health status of the drive? If it’s below 60% I’d be replacing that drive. Many drives also have tools available from the manufacturer that you can install which is also worth doing to see if there is an updated firmware available for the drive. Older SSDs particularly might need a firmware update on newer systems.

Done a firmware update already. Drive is healthy.

Ketxxx said:
2. Asus have a new firmware, 5003. It updates the AGESA which usually brings improvements and fixes for memory so while in your case it likely won’t help, it can’t hurt to update it either. No need to leave the system for 30 mins when clearing the CMOS, just remove the power cord from the PSU and the CR2032 battery, hit the power button 3-4 times to discharge the capacitors, leave a few more seconds if you want then power up.

I'll do this after fixing other issues first, as it introduces a vulnerability fix which hurts performance by like 50%.

Ketxxx said:
3. Enter the mainboard firmware utility and enable the memory XMP profile, there should be 2, one for 3000/3200, the other rated 3600. Depends on the memory manufacturer here some just program the same SPD values to both profiles and really lazy manufacturers don’t even bother programming the second profile at all so you’ll only have one. Also manually set DRAM voltage to 1.375v\1.38v, or whatever the closest to that is you can set this is to allow for some voltage fluctuation to make sure you’re always getting a minimum 1.35v.

There's only the one - 3600. It has been established that the freezes happen even with 1.2V and 2666MHz, so could tweaking these settings really help us here?

Ketxxx said:
I also see you are running a 1T command rate, this could be the root of your problems. Enable GDM (Gear Down Mode) instead of leaving it at “Auto”. Alternatively for testing purposes you can just run a flat 2T CM by manually setting the CM to 2T. All the boards I use scatter this setting slightly differently so you might have to go digging in the AMD OC options if Asus haven’t put it in the AI Tweaker section. Usually to force 2T you’ll also need to set GDM mode to “Disabled”.

I think I'd rather set to 2T myself and disable GDM.

Ketxxx said:
5. 650w even for a good PSU is pretty weak for a system that has the spec of yours, have you used anything like HWinfo64 to monitor how much power is being drawn by the system when it crashes\hard locks? I’d also take the opportunity to measure temperatures and voltages as well with HWinfo64 to make sure there aren’t any issues there. As for why your PSU, or a PSU in general, would start to develop issues, same as anything. It's made from multiple parts that just begin to fail through normal wear and tear, and in your case the PSU is getting up there in age the capacitors in it could well be starting to bulge at this point. Yes, even solid state caps bulge eventually.

I'm running HWInfo64 with logging enabled. If the system crashes again, what should we be looking for in the logs?

Ketxxx said:
6. Download and run HCI memtest LINK, if you have memory errors HCI will find them in quick order. Programs like Memtest86, 64, or TestMem5 simply aren’t good enough. I have a saying here, “If you’re HCI Memtest and BOINC MilkyWay@Home stable, you’re golden.” I’ve literally never had stability problems with any system that has passed a 24hr test of HCI and BOINC.

I've used Memtest already even on my Corsairs, zero errors found overnight at 650% coverage - 13 instances running at 2500MB each.

Ketxxx said:
8. Short of removing the heatsinks the Corsair modules for what it’s worth at least look like they are using the same ICs but Thaiphoon Burner can’t identify them properly so there is no guarantee the ICs are absolutely identical Corsair could have switched to a newer revision from one batch to the next, something that might actually be indicated by the minor serial number difference.

Version is the same for the Corsairs, for the Vipers there's absolutely zero difference between the two sticks.

Ketxxx said:
9. Lower FCLK, if stability improves it’s an indicator you could have a bum IMC.

I've been trying to "improve stability" for the entire duration of this thread - problem is I never know when I've finally done it. What I found (like I've said before) that I get a 10% performance hit (and clock hit) on all-core when disabling PBO Fmax Enhancer, but that only increases the crash interval from a few hours to 24 hours.

Ketxxx said:
10. The fact you initially didn’t have issues suggests (but doesn’t guarantee) that the fault has developed over time. Behaviour indicative and typical of a memory stick going bad, IMC issues, or as unlikely as it is, bad mainboard trace.

I put in a request for a mainboard replacement with the vendor after the Bluetooth adapter died yesterday. Now that it has "un-died", I am not sure if it did even develop a fault or not.

Ketxxx said:
11. Third party antivirus hasn’t been needed since Windows 8 really, but MS Defender didn’t really start holding its own until W10. Either way, forget about your 3rd party anti-virus. Not needed. These days it’ll hinder more than it helps.

It's a paid antivirus so unless it's actually harming things I'd rather leave it in.

Ketxxx said:
12. Create a system restore point to roll back to if needed for a safety net, or download and use some backup software like Paragon Backup & Recovery or AOMEI Backupper.

I don't care about system wipes. There's been so many reinstalls, there's nothing on the system drive except utilities anyway.

Ketxxx said:
13. Failing all of that, it’s time to wipe the disk clean and do a completely fresh Windows install and grab the latest drivers directly from the manufacturer website (read: don’t rely on WU or the motherboard manufacturer website go directly to the manufacturers website for each piece of hardware in the system. Eg; nvidia, Realtek, AMD, etc.) Keep the installation barebones, no additional software except for diagnostic tools and whatever WUs Windows wants to install but only enable your net connection once you are done installing the latest hardware drivers so Windows can't screw with them... at least not in the immediate future.

Been doing this already.

Ketxxx said:
14. I’d also check all connections and components are seated correctly, better to be certain.

Everything is good in that area.

Ketxxx said:
Let me know the results, I’m going to bet the Viper kit at the very least is a fair bit more compatible than the Corsair kit. You’ve got a really shitty kit of memory if the system refuses to POST at stock settings (usually 2666 default)

Boots fine at stock with both module types.

Ketxxx said:
your wifi card, are you talking about what your board comes with or another one you have installed?

Board default.

Ketxxx said:
If the latter, move it to a different PCI-e slot or better yet remove it and the driver for it completely and test.

Better said than done, I'll have to remove some plastic thingy with RGB on it (check gallery photos of the back-panel components online) to get to that card - it's one whole assembly. If I'm to have any chance of getting a similar board without paying full price AGAIN I'd better not mess with that.

Ketxxx said:
Nothing, and I mean nothing IT related, I've failed to get to the bottom of in 25+ years, I'll be damned if your problem defeats me

I so wish I were a rich noob. I'd just dump this whole thing and get a new system. Problem is I'm not a rich noob and built this baby myself so I have to fix it - for better or worse.

I got another crash, and the Bluetooth adapter error is back.

Also, I found this for whoever is interested: https://learn.microsoft.com/en-us/t...d-41-entry-or-lists-error-code-values-of-zero

In the kernel-related logs in Event viewer, I'm getting all error codes of zero. Even if it were a driver-related reboot, I should have gotten some error codes, but I'm not.

That means this is a hardware-related reboot, but I'm not sure - is this going to be resolved by changing the motherboard or power supply?

VuurVOS · Nov 26, 2023

cst1992 said:
Also, I found this for whoever is interested: https://learn.microsoft.com/en-us/t...d-41-entry-or-lists-error-code-values-of-zero

In the kernel-related logs in Event viewer, I'm getting all error codes of zero. Even if it were a driver-related reboot, I should have gotten some error codes, but I'm not.

That means this is a hardware-related reboot, but I'm not sure - is this going to be resolved by changing the motherboard or power supply?

I dont think it is your power supply.

If the motherboard isn't in warranty and you are not using wifi & bluetooth:

You can disable the device in device manager.
You can disable Link State Power Management (Guide)
You can remove/replace the wifi adapter since it is an expansion card which also contains bluetooth (Guide for replacing the card)

If the motherboard is in warranty, you can consider to RMA it. Make sure you tried a fresh Windows installation first.

cst1992 · Nov 26, 2023

I don't think it's the power supply either, but right now I'm trying out disabling C-state Control in AMD CBS and changing PSU Idle Current setting to Typical, which some people say might fix issues of the power supply cutting out for a bit due to low current.

I do have a use for WiFi and Bluetooth as my intention for this PC is to use it in a different room as the router, plus my keyboard and mouse are Bluetooth (with an additional pair for using BIOS).

Changing the adapter is another $40. I'd rather try to RMA the board. If a board fault caused the adapter to die, why should I bear the cost of it?

Board was bought in mid-August.

Ketxxx · Nov 26, 2023

cst1992 said:
Question: if a system is up for close to 24 hours, then fails with things like USB failure, Bluetooth failure, etc what is that indicative of? Because now that the system has had a "good night's sleep" all issues have resolved again. USB connects fine, Bluetooth adapter is no longer "failing in an undetermined manner". Even Event Viewer is not showing any problematic logs. It's like yesterday never happened!

This suggests a number of things, if the wifi card is removable on your board my first thought would be it's not seated correctly and/or could have some corrosion or dirt on the teeth. It wouldn't be the first time I've had an Asus board with shitty quality control like that. An Asus Strix X370-F I wrote a review for actually had mold or moss on one of the thermal pads. Still got the image of that too. If the wifi card is soldered however my thoughts would be a bad solder ball joint, either cracked or outright broken. Another possible cause is still memory related as well, as in pretty much all boards today have a mind of their own and it's not unusual to see some memory timings change from one boot to the next which is why I never, ever, leave any memory timings on "Auto".

Your drive temperatures look reasonable, but that 45c on the Evo is a little on the high side, I'd expect it to be closer to 36-40c at idle on average but if there's one thing everyone should be aware of is that all these M.2 heatsinks on boards these days aren't actually all that effective because they don't apply enough pressure where the included stock thermal pads are never quite thick enough so one thing I would do is measure the thickness of the thermal pad with a ruler or digital micrometer and get something thicker with a good thermal conductive rating. Eg; if the stock pad is 1.5mm get a pad that's 2mm with a thermal rating of 6W/Mk. If you haven't already, swap SATA ports and the cable. You'll rule out a cable and SATA port fault at least this way.

cst1992 said:
I'll do this after fixing other issues first, as it introduces a vulnerability fix which hurts performance by like 50%.

That's only related to the TPM (Trusted Platform Module) AFAIK. You don't even need that for Win11 just download a standard W11 ISO and create a boot image with RUFUS LINK but select the option to bypass the softlocks (and yes they are just that - softlocks) where the setup would usually check then whine about not finding a TPM. Trivia Tidbit: The TPM is actually an inherently insecure "Safety" not designed for the current use W11 does with it. That's not only verifiable if you look at the actual intended purpose of the TPM but the creators of the TPM have said as such as well. In short, you're safer without a TPM than you are with one if you're using W11.

cst1992 said:
I'm running HWInfo64 with logging enabled. If the system crashes again, what should we be looking for in the logs?

Anything that might have started drawing an unusual amount of current (not necessarily just watts), any unusually high temperatures, and even anything that might be using an unusually low amount of power. PSUs are at their least efficient when they are idle so if your PSU is developing a fault any one of these things could be happening. The most common problems I find with older PSUs is either some capacitors in them are going bad or some of the solder joints need reflowing, both of which manifest as unusual system behaviour such as random restarts, shut downs, system freezes, etc.

cst1992 said:
I've used Memtest already even on my Corsairs, zero errors found overnight at 650% coverage - 13 instances running at 2500MB each.

That sounds like either your memory is stable or the error is very intermittent. HCI Memtest is extremely reliable at catching memory errors so my only recommendation here would be to run the test for a minimum of 12 hours and make sure the little box for "Low priority threads" is un-checked. I'd also stop and start HCI again after a couple minutes before leaving it to run long haul because Windows has an irritating habit of automatically shutting system background processes down when it detects physical memory is running low which tends to free up almost 1GB of RAM. You want to stress pretty much every byte available.

cst1992 said:
I've been trying to "improve stability" for the entire duration of this thread - problem is I never know when I've finally done it. What I found (like I've said before) that I get a 10% performance hit (and clock hit) on all-core when disabling PBO Fmax Enhancer, but that only increases the crash interval from a few hours to 24 hours.

This is telling you more than you think it is. Based on what you say here there is a strong indicator that CPU stability is an issue, it might not be the only issue, but it does look like one. Disable PBO and set CPU frequency and voltage manually. Use an all core multiplier of 37x for your CPUs base frequency of 3.7GHz. Manually set the CPU voltage to 1.25v - this should be more than enough for your base frequency. We can focus on best performance once a stable baseline is established. I'm actually wondering if the Asus board might be cooking your CPU by throwing way too much voltage at it when PBO is enabled, my system actually does this as well, 1.5v+ for 4.4GHz and in reality it only needs 1.3v. PBO is absolutely :kookoo:

heh.

cst1992 said:
I put in a request for a mainboard replacement with the vendor after the Bluetooth adapter died yesterday. Now that it has "un-died", I am not sure if it did even develop a fault or not.

You have undoubtedly got red herrings with your problem, one root issue is causing other system components to go a fly which is leading you down erroneous routes. It's unlikely that your motherboard has an actual fault but not out of the realm of possibility. It's on the checklist, but toward the bottom. All that has to be done is methodically check things one by one to isolate the issue. Easily done by someone like me but an absolute nightmare for anyone inexperienced.

It's a paid antivirus so unless it's actually harming things I'd rather leave it in.

It could actually be part of the problem, so for now at least leave it uninstalled.

I so wish I were a rich noob. I'd just dump this whole thing and get a new system. Problem is I'm not a rich noob and built this baby myself so I have to fix it - for better or worse.

Do not worry in the slightest I started in exactly the same position as you long ago and you know what? You learn a hell of a lot more and become a hell of a lot more capable this way.

Processor	AMD R7 1700X @ 4100Mhz
Motherboard	MSI B450M MORTAR MAX (MS-7B89)
Cooling	Phanteks PH-TC14PE
Memory	Crucial Technology 16GB DR (DDR4-3600) - C9BLM:045M:E BL16G36C16U4W.M16FE1 X2 @ CL14
Video Card(s)	XFX RX480 GTR 8GB @ 1408Mhz (AMD Auto OC)
Storage	Samsung SSD 850 EVO 250GB
Display(s)	Acer KG271 1080p @ 81Hz
Power Supply	SuperFlower Leadex II 750W 80+ Gold
Keyboard	Redragon Devarajas RGB
Software	Microsoft Windows 10 (10.0) Professional 64-bit
Benchmark Scores	https://valid.x86.fr/mvvj3a

System Name	The Sparing-No-Expense Build
Processor	Ryzen 5 5600X
Motherboard	Asus ROG Strix X570-E Gaming Wifi II
Cooling	Noctua NH-U12S chromax.black
Memory	32GB: 2x16GB Patriot Viper Steel 3600MHz C18
Video Card(s)	NVIDIA RTX 3060Ti Founder's Edition
Storage	500GB 970 Evo Plus NVMe, 2TB Crucial MX500
Display(s)	AOC C24G1 144Hz 24" 1080p Monitor
Case	Lian Li O11 Dynamic EVO White
Power Supply	Seasonic X-650 Gold PSU (SS-650KM3)
Software	Windows 11 Home 64-bit

System Name	Ryzen7700
Processor	AMD Ryzen 7 7700
Motherboard	Asus ROG STRIX B650E-F GAMING WIFI
Cooling	NZXT Kraken X62
Memory	Patriot Viper Venom PVV532G700C32K (32GB @ 6000CL28)
Video Card(s)	AMD Radeon RX 6800XT Midnight Black

System Name	The Sparing-No-Expense Build
Processor	Ryzen 5 5600X
Motherboard	Asus ROG Strix X570-E Gaming Wifi II
Cooling	Noctua NH-U12S chromax.black
Memory	32GB: 2x16GB Patriot Viper Steel 3600MHz C18
Video Card(s)	NVIDIA RTX 3060Ti Founder's Edition
Storage	500GB 970 Evo Plus NVMe, 2TB Crucial MX500
Display(s)	AOC C24G1 144Hz 24" 1080p Monitor
Case	Lian Li O11 Dynamic EVO White
Power Supply	Seasonic X-650 Gold PSU (SS-650KM3)
Software	Windows 11 Home 64-bit

System Name	Every cuss word I can think of, and a few more I've made up
Processor	Dual System - Ryzen R9 5900X / Ryzen R7 1700
Motherboard	(R9) Gigabyte B550 Aorus Master / (R7) MSI B450M Gaming Bazooka
Cooling	Scythe Mugen 5 Black Edition for both
Memory	(R9) 2x16 Patriot Viper 4 Blackout PV432G320C6K (3200) / (R7) 4x8 HyperX Fury HX421C14FBK4/32 (2133)
Video Card(s)	(R9) Asus Tuf RTX3090 24GB / R7 / EVGA FTW RTX3060ti 8GB (for now)
Storage	(Primary) 1TB WD Blue SN5x0 M.2s, 8TB / 6TB WD Black, 2TB MX500, Pioneer BDR-212DBK ODD
Display(s)	75" Hisense A6 (60 hz)
Case	NavePoint 15U Networking Cabinet
Audio Device(s)	(Both) Onboard RealTek audio, PreSonus 24c interface
Power Supply	(R9) Corsair RM1000x / (R7) Corsair RM750x
Mouse	Logitech K520
Keyboard	Logitech K520
Software	LibreOffice, BeamNG.drive, Classic Doom and variants, ATS, NCH VideoPad, OBS Studio, MPC-HC, iCUE

Weird memory timings with Ryzen 5 5600X + freezing issues?

The King

cst1992

VuurVOS

cst1992

An0maly_76

cst1992

An0maly_76

oobymach

VuurVOS

cst1992

oobymach

VuurVOS

cst1992

Ketxxx

Heedless Psychic

cst1992

AleXXX666

cst1992

Attachments

Ketxxx

Heedless Psychic

VuurVOS

cst1992

Ketxxx

Heedless Psychic

cst1992

VuurVOS

cst1992

Ketxxx

Heedless Psychic

System Name	New compy
Processor	AMD Ryzen 5800x3D
Motherboard	MSI MPG x570S EDGE MAX WiFi
Cooling	Noctua NH-D15S w. FHP141 + Xigmatek AOS XAF-F1451
Memory	32gb G.Skill Ripjaws V Samsung B-Die Dual Rank F4-4000C16D-32GVKA
Video Card(s)	ASUS TUF GAMING RTX 4070ti
Storage	17tb (8+4tb WD Black HDD's, 2+2+0.5+0.5tb M.2 SSD Drives) + 16tb WD Red Pro backup drive
Display(s)	Alienware AW2518H 24" 240hz, Sony X85K 43" 4k 120hz HDR TV
Case	Thermaltake Core v71
Audio Device(s)	iFi Nano Idsd Le, Creative T20 + T50, Sennheiser HD6Mix
Power Supply	EVGA Supernova G2 1000w
Mouse	Logitech G502 Hero custom w. G900 scroll wheel mod, Rival 3 + Rival 3 wireless, JLab Epic Mouse
Keyboard	Corsair K68 RGB + K70 RGB + K57 RGB Wireless + Logitech G613
Software	Win 10 Pro
Benchmark Scores	https://valid.x86.fr/s2y7ny

System Name	Ravens Talon
Processor	AMD R7 3700X @ 4.4GHz 1.3v
Motherboard	MSI X570 Tomahawk
Cooling	Modded 240mm Coolermaster Liquidmaster
Memory	2x16GB Klevv BoltX 3600MHz & custom timings
Video Card(s)	Powercolor 6800XT Red Devil
Storage	500GB NVMe Asgard SSD, 1TB NVMe Integral SSD, 2TB Seagate Barracuda
Display(s)	27" BenQ Mobiuz
Case	NZXT Phantom 530
Audio Device(s)	Asus Xonar DX 7.1 PCI-E
Power Supply	1000w Supernova
Software	Windows 10 x64
Benchmark Scores	Fast. I don't need epeen.

System Name	Laptop ASUS TUF F15 \| Desktop 1 \| Desktop 2
Processor	Intel Core i7-11800H \| Intel Core i5-14600K@135W \| Intel Core i3-10100
Motherboard	ASUS FX506HC \| Gigabyte B660M DS3H DDR4 \| MSI MAG B560M Bazooka
Cooling	Laptop built-in cooling lol \| Thermalright Assassin Spirit w/ BeQuiet Shadow Wings fan\| Stock Copper
Memory	24 GB @ 3200 \| 32 GB @ 3733 \| 16 GB @ 3200
Video Card(s)	Nvidia RTX 3050 Mobile 4GB \| Nvidia GTX 1650 \| Nvidia GTX 960 2 GB
Storage	Adata XPG SX8200 Pro 512 GB \| Samsung M2 SSD 256 GB & 1 TB 2.5" HDD @ 7200\| SSD 250 GB & SSD 240 GB
Display(s)	Laptop built-in 144 Hz FHD screen \| Dell 27" WQHD @ 75 Hz & 49" TV FHD \| Samsung 32" TV FHD
Case	It's a laptop, it doesn't need case lmfao \| Deepcool Mattrexx 55 MESH \| Aerocool Cylon PRO
Audio Device(s)	laptop built in audio \| Logitech 2.1 speakers \| Logitech stereo speakers
Power Supply	ASUS 180W PSU \| SeaSonic Focus GX-550 \| SeaSonic M12II EVO 520W
Mouse	Logitech G604 \| Corsair Harpoon wired mouse\| Logitech G305
Keyboard	Laptop built-in keyboard \|Razer Blackwidow \| Steelseries APEX 7 TKL
VR HMD	Quest 2 sold out and don't need VR anymore lol
Software	Windows 10 Enterprise 20H2 \| Windows 10 Enterprise 20H2 & Ubuntu Mate 24.04.2\| Windows 11 24H2 LTSC
Benchmark Scores	good enough