Right, well, I went ahead and just read 7 odd pages meticulously absorbing everything. Boy, is this thread all over the place but I’ll articulate my thoughts and suggestions as best as possible. I’ll cover everything for completeness and an easy jump-in point. If anything is recently done then just skip to the next one I’m mainly doing things this way for anyone else who has this kind of issue and is googling for solutions.
1. Disconnect all system drives except for the OS boot drive, if the drive is a 2.5” SSD ensure the drive is connected to the SATA 1 port. It’s also worth using a different SATA cable to rule out there being issues with the cable itself. While you’re at it remove the SATA power cable, make sure there’s no dust\debris in or on the connection and reseat it. If the drive is already on SATA 1, move it to a different port such as 3.
Regardless of if the drive is a 2.5” SATA or M.2 SATA\NVMe drive download and run Crystal Disk Info, what is the temperature and health status of the drive? If it’s below 60% I’d be replacing that drive. Many drives also have tools available from the manufacturer that you can install which is also worth doing to see if there is an updated firmware available for the drive. Older SSDs particularly might need a firmware update on newer systems.
2. Asus have a new firmware, 5003. It updates the AGESA which usually brings improvements and fixes for memory so while in your case it likely won’t help, it can’t hurt to update it either. No need to leave the system for 30 mins when clearing the CMOS, just remove the power cord from the PSU and the CR2032 battery, hit the power button 3-4 times to discharge the capacitors, leave a few more seconds if you want then power up.
3. Enter the mainboard firmware utility and enable the memory XMP profile, there should be 2, one for 3000/3200, the other rated 3600. Depends on the memory manufacturer here some just program the same SPD values to both profiles and really lazy manufacturers don’t even bother programming the second profile at all so you’ll only have one. Also manually set DRAM voltage to 1.375v\1.38v, or whatever the closest to that is you can set this is to allow for some voltage fluctuation to make sure you’re always getting a minimum 1.35v.
4. Make sure any setting Asus has to fiddle with settings automatically is disabled. You just want the XMP profile active here with manually set DRAM voltage.
I also see you are running a 1T command rate, this could be the root of your problems. Enable GDM (Gear Down Mode) instead of leaving it at “Auto”. Alternatively for testing purposes you can just run a flat 2T CM by manually setting the CM to 2T. All the boards I use scatter this setting slightly differently so you might have to go digging in the AMD OC options if Asus haven’t put it in the AI Tweaker section. Usually to force 2T you’ll also need to set GDM mode to “Disabled”.
5. 650w even for a good PSU is pretty weak for a system that has the spec of yours, have you used anything like HWinfo64 to monitor how much power is being drawn by the system when it crashes\hard locks? I’d also take the opportunity to measure temperatures and voltages as well with HWinfo64 to make sure there aren’t any issues there. As for why your PSU, or a PSU in general, would start to develop issues, same as anything. It's made from multiple parts that just begin to fail through normal wear and tear, and in your case the PSU is getting up there in age the capacitors in it could well be starting to bulge at this point. Yes, even solid state caps bulge eventually.
6. Download and run HCI memtest
LINK, if you have memory errors HCI will find them in quick order. Programs like Memtest86, 64, or TestMem5 simply aren’t good enough. I have a saying here, “If you’re HCI Memtest and BOINC
MilkyWay@Home stable, you’re golden.” I’ve literally never had stability problems with any system that has passed a 24hr test of HCI and BOINC.
I’d also forget about OCCT, most unreliable software out there you can use to diagnose issues IMO that software has reported problems on completely stock systems I’ve used it with what I know are absolutely rock solid stable.
7. Kind of moot this as you have a kit of Vipers now which is what I would have recommended to switch to, well, the 18-20-20-40 kit anyway as it’s the cheapest not completely crap kit you can get, but run each memory module individually across all of the DIMM banks, while testing for memory errors with HCI Memtest. I’d run at least 3 hours in each bank while testing for errors, anything less than that is no test at all.
8. Short of removing the heatsinks the Corsair modules for what it’s worth at least look like they are using the same ICs but Thaiphoon Burner can’t identify them properly so there is no guarantee the ICs are absolutely identical Corsair could have switched to a newer revision from one batch to the next, something that might actually be indicated by the minor serial number difference.
Also don’t take a manufacturers QVL list as gospel. The QVL is just a list of modules that have been tested with the board on a specific firmware revision during development – still no guarantees that a newer firmware revision wouldn’t introduce problems with a kit that has previously been verified “OK” on an older firmware. Nor does the QVL take into account things like board revision\hardware changes prior to or during production, nor does a QVL usually take into account a memory manufacturer having to change what ICs they use because what they were using either becomes unavailable, obsolete, too expensive, or simply superseded with a newer IC revision. It’s more reliable to look at the types of memory ICs a board has been tested with and pick a module\kit that you know use one of those ICs. I’ve done that for years, never had a problem since I started doing that.
9. Lower FCLK, if stability improves it’s an indicator you could have a bum IMC.
10. The fact you initially didn’t have issues suggests (but doesn’t guarantee) that the fault has developed over time. Behaviour indicative and typical of a memory stick going bad, IMC issues, or as unlikely as it is, bad mainboard trace.
11. Third party antivirus hasn’t been needed since Windows 8 really, but MS Defender didn’t really start holding its own until W10. Either way, forget about your 3rd party anti-virus. Not needed. These days it’ll hinder more than it helps.
12. Create a system restore point to roll back to if needed for a safety net, or download and use some backup software like Paragon Backup & Recovery or AOMEI Backupper.
13. Failing all of that, it’s time to wipe the disk clean and do a completely fresh Windows install and grab the latest drivers directly from the manufacturer website (read: don’t rely on WU or the motherboard manufacturer website go directly to the manufacturers website for each piece of hardware in the system. Eg; nvidia, Realtek, AMD, etc.) Keep the installation barebones, no additional software except for diagnostic tools and whatever WUs Windows wants to install but only enable your net connection once you are done installing the latest hardware drivers so Windows can't screw with them... at least not in the immediate future.
14. I’d also check all connections and components are seated correctly, better to be certain.
Let me know the results, I’m going to bet the Viper kit at the very least is a fair bit more compatible than the Corsair kit. You’ve got a really shitty kit of memory if the system refuses to POST at stock settings (usually 2666 default) in one set of DIMM banks. The only outlier is your wifi card, are you talking about what your board comes with or another one you have installed? If the latter, move it to a different PCI-e slot or better yet remove it and the driver for it completely and test.
Nothing, and I mean
nothing IT related
, I've failed to get to the bottom of in 25+ years, I'll be damned if your problem defeats me