• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

DRAM Crashes After 10 Minutes of BIOS Idle on ASUS TUF GAMING X570-PLUS (WI-FI)

Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000

Issue:​

I’m experiencing DRAM crashes after ~10 minutes of idling in the BIOS on my ASUS TUF GAMING X570-PLUS (WI-FI) motherboard. Additionally, I get random crashes while using my system, which seem to occur about 1 hour into use on the bios I'm currently on 4022. The sound sounds static like and the computer freezes for a few seconds before going into the crash. When I was using the latest bios 5022, my pc would crash after boot after only ~10 minutes. When the crash happens, the screen goes black, the fans continue running, and the orange DRAM light on the motherboard turns on.


System Specs:​

  • CPU: AMD Ryzen 5 5600X 3.7 GHz 6-Core Processor
  • Cooler: Cooler Master Hyper 212 RGB Black Edition
  • Motherboard: Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
  • Memory: Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
  • SSD: Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
  • GPU: Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
  • Case: Phanteks Enthoo Pro ATX Full Tower Case
  • Power Supply: Seasonic Focus GX-1000

Things I’ve Tried (Still Crashed):​

  1. BIOS Updates:
    • I’ve tried different BIOS versions 4022, 4602, 4802, 5003, 5013, 5020
  2. Uninstalling ASUS Software:
    • Removed Armory Crate and AI Suite 3 using the official ASUS tool and ASUS Setup Tool.
  3. CMOS Reset:
    • Removed the BIOS battery for 30 minutes and shorted the CLRTC pins for 15 seconds between changes.
  4. Tested Other RAM:
    • Swapped in a set of G.Skill Ripjaws V 16 GB (2 x 8 GB) DDR4-3200.
  5. Reseating Components:
    • Unplugged and reseated internal components and connections.
  6. Different RAM Slots:
    • Tried different combinations of RAM slots
  7. BIOS Settings Adjustments:
    • Changed various BIOS settings, including lowering DRAM voltage, setting Command Rate to 2T and turning off gear down and disabling gear down and power down mode
  8. Manual SOC/CCD/IOD Voltage Tuning:
    • I have tried adjusting the voltages:
      • SOC at 1.1V
      • CCD at 0.9V
      • IOD at 0.95V
    • These settings are were recommended on forums to stabilize high-speed memory configurations
  9. Checked Rails and Temperatures:
    • I’ve checked the voltage on my rails, and they seem fine:
      • 11.9V on the 12V rail
      • 3.3V on the 3.3V rail
      • 4.9V on the 5V rail
    • I’ve also checked system temperatures, and they are within a normal range.
  10. Driver Uninstalling and Reinstalling:
    • I have uninstalled and reinstalled the drivers for my NVIDIA 3080 Ti and ensured that all system drivers, including chipset and GPU drivers, are fully up to date
  11. Unplugging Peripherals:
    • I tried running the system with various peripherals disconnected (e.g., RGB, USB devices)
  12. Checking for Voltage drops/ faulty outlet
  13. Running
    sfc /scannow
    DISM /Online /Cleanup-Image /CheckHealth
    DISM /Online /Cleanup-Image /ScanHealth
    DISM /Online /Cleanup-Image /RestoreHealth
  14. New PSU
    Corsair 750w replaced by Seasonic 1000w
  15. CrystalDiskInfo
    all drives were listed as in good health
I’m looking for advice on what else I can try to resolve these DRAM crashes. Are there additional settings I should look into or components I should test? I am considering buying a new motherboard since I suspect the current one may be corrupted, but it’s also possible that there is an issue with the CPU, such as the CPU’s Integrated Memory Controller (IMC) is corrupted, and I may need to replace it instead.

Edit 1. The crashes were happening even when set to stock values with the gskill and corsair ram, and when bringing them to the lowesst frequency possible. Also have tried adjusting soc value between 1.1 and 1.2
Screenshot_4.png
 
Last edited:

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,134 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Clearly you aren't a stranger to DDR4, but what are you actually memtesting with? If it's anything less demanding than Testmem5, Karhu, or HCI Memtest (at the very least), then memtest properly first. It's good that you know that VSOC and CLDOs play a part, but if memory itself ain't stable then all this is pointless.

Any WHEA errors in Event Viewer?

I'm not sure why you thought lowering DRAM voltage would help - yes, most [crappy low-end] ICs do not scale with increasing voltage, but neither is lowering VDIMM without having verified stability productive in any way.

VDDGs can stay where they are, but if you want to run the 4x16GB kit (which is quad rank clearly, since the sticks are dual rank), you may need more VSOC to appease the memory controller. That, or if the UMC still throws fits up to a max of 1.2V VSOC, then I'm afraid 3600 (1800MHz) just isn't doable for your CPU sample. Quad rank is heavy and although people have run that kind of config past 3600, it's far from guaranteed.

The 2 stick G.skill kit won't share the same concerns over UMC load, especially at 3200, but without memtesting both kits there's not much else to say.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
42,055 (6.62/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Try different memory, vengeance on am4 is a headache
 
Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000
Clearly you aren't a stranger to DDR4, but what are you actually memtesting with? If it's anything less demanding than Testmem5, Karhu, or HCI Memtest (at the very least), then memtest properly first. It's good that you know that VSOC and CLDOs play a part, but if memory itself ain't stable then all this is pointless.

Any WHEA errors in Event Viewer?

I'm not sure why you thought lowering DRAM voltage would help - yes, most [crappy low-end] ICs do not scale with increasing voltage, but neither is lowering VDIMM without having verified stability productive in any way.

VDDGs can stay where they are, but if you want to run the 4x16GB kit (which is quad rank clearly, since the sticks are dual rank), you may need more VSOC to appease the memory controller. That, or if the UMC still throws fits up to a max of 1.2V VSOC, then I'm afraid 3600 (1800MHz) just isn't doable for your CPU sample. Quad rank is heavy and although people have run that kind of config past 3600, it's far from guaranteed.

The 2 stick G.skill kit won't share the same concerns over UMC load, especially at 3200, but without memtesting both kits there's not much else to say.
Thanks for the detailed response! This is actually the first PC I’ve built, so I’m still learning. There are no WHEA errors in the Event Viewer.

I’ll memtest both kits to properly test the memory stability and try increasing VSOC to support the memory controller.

Thanks again for the guidance!
 
Joined
Apr 21, 2021
Messages
250 (0.19/day)
System Name Silicon Graphics O2
Processor R5000 / 180MHz
Cooling noisy fan
Memory 384 MB
Storage 4 GB
Case the one with the old logo and proud of it ;)
Software IRIX 6.5
The sound sounds static like and the computer freezes for a few seconds before going into the crash.
This literally sounds like your infinity fabric can't handle your memory overclock.

Try running y-cruncher with the VT3 and FFTv4 stress tests at the same time, and watch out for WHEAs in something like HWiNFO. If it's inconclusive, repeat it with the Linpack Xtreme stress test and HWiNFO open in the background.

You should see WHEAs occurring at a somewhat regular rate, assuming the stress tests don't outright crash. If you only end up with a pile of WHEAs, you most likely have to either lower fabric speed or try to fiddle with voltages. If your memory crashes, especially at stock, it's probably a deeper issue than just a too ambitious memory overclock.

CPU: AMD Ryzen 5 5600X 3.7 GHz 6-Core Processor
What stepping is your CPU? If it's a B0, what week was it manufactured? You can see the manufacture date in the batch code engraved onto your CPU's IHS.
 
Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000
This literally sounds like your infinity fabric can't handle your memory overclock.

Try running y-cruncher with the VT3 and FFTv4 stress tests at the same time, and watch out for WHEAs in something like HWiNFO. If it's inconclusive, repeat it with the Linpack Xtreme stress test and HWiNFO open in the background.

You should see WHEAs occurring at a somewhat regular rate, assuming the stress tests don't outright crash. If you only end up with a pile of WHEAs, you most likely have to either lower fabric speed or try to fiddle with voltages. If your memory crashes, especially at stock, it's probably a deeper issue than just a too ambitious memory overclock.


What stepping is your CPU? If it's a B0, what week was it manufactured? You can see the manufacture date in the batch code engraved onto your CPU's IHS.
I’m also getting crashes even at stock speeds. I ran VT3 and FFTv4 tests and passed without any WHEA errors, but I suspect the issue is related to either the CPU or motherboard. I’ll be running MemTest86 overnight on my Corsair RAM, and I’ll check the IHS once I get some isopropyl alcohol to clean it.

Here’s the YouTube link to my event viewer logs for WHEA errors and error code 41.
If anything looks weird let me know and I can click on it or if you want me to take another video looking at a specific section.

Also I bridged the clrtc pins while the board was live while troubleshooting, before I knew it had to be off, which may have damaged the motherboard.
 
Last edited:

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,134 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
I’m also getting crashes even at stock speeds. I ran VT3 and FFTv4 tests and passed without any WHEA errors, but I suspect the issue is related to either the CPU or motherboard. I’ll be running MemTest86 overnight on my Corsair RAM, and I’ll check the IHS once I get some isopropyl alcohol to clean it.

Here’s the YouTube link to my event viewer logs for WHEA errors and error code 41.
If anything looks weird let me know and I can click on it or if you want me to take another video looking at a specific section.

Also I bridged the clrtc pins while the board was live while troubleshooting, before I knew it had to be off, which may have damaged the motherboard.

Sort Event Viewer alphabetically by event type and look for WHEAs that have more detailed info. Scrolling by time like that in the video is impossible to see anything.

Obviously hitting the jumper pins while it's on isn't a good idea, but nothing seems to be out of the ordinary then don't worry too much about it. I'd be lying if I said I haven't accidentally done it before.

whea errors - Copy.png


The ycruncher tests are supposed to help uncover uncore instability that is more likely to log WHEAs, so it is already CPU related. But WHEAs are one of the most fickle things, highly inconsistent in all but the most egregious cases of hardware instability, behave differently on different CPU models, and the most you can do is try to make them appear by applying more Fabric/UMC load.

Get the memtesting done first. That is always step one, before any WHEA-related troubleshooting.
 
Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000
I ran memtest for the past 12 hours with the 64gb corsair vengence ram and had no errors. Afterwards, I found a WHEA error from 9/4/2024 in the Event Viewer (I’ve attached a screenshot for reference). The raw data for the error in case it's important is:

435045521002FFFFFFFF01000100000007000000CE010000063A0F00040918143C60C1835215A74887D114D9467D7765000000000000000000000000000000008D7C2157665EFB4480339B74CACEDF5B03F83300702E884E992C6F26DAF3DB7ACD20939F53F5DA01080000000000000000000000000000000000000000000000C8000000060100000003020001000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000053544F52504F525401000601A4000000010003000B000000E39793895020EE806989BDAF12C76F6E730074006F00720061006800630069000000000000000000000000000000000053616D73756E672000535344203837302051564F203454420000A4000000A40000007DE05002000000001600000000000100DD5CEC0000000000000000000000000000000000640000000000000000000000000000000000000063000000FFFFFFFF00000000000100000000000000000000010000000000000000000000000000000100000000000000630000000000000065000000640000000000000000000000000000006500000063000000630000001400000000000000B7D60200

I sorted the Event Viewer alphabetically by event type, as you suggested, and this was the only WHEA I found.

Thanks for the guidance—I'll keep digging into it. I have ordered a 5900x and isopropyl alcohol, and when it arrives, I will check the IHS as requested by Sarajiel to see what it says, and see if a new CPU fixes things. Now I am running y-cruncher again to see if I can get any WHEA errors. My only crash dump analyzed in WinDbg from 10/7 says the following
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff80397740885, address which referenced memory

Debugging Details:
------------------

*** WARNING: Unable to verify timestamp for nvlddmkm.sys

KEY_VALUES_STRING: 1


STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1


DUMP_CLASS: 1

DUMP_QUALIFIER: 400

BUILD_VERSION_STRING: 19041.1.amd64fre.vb_release.191206-1406

SYSTEM_MANUFACTURER: System manufacturer

SYSTEM_PRODUCT_NAME: System Product Name

SYSTEM_SKU: SKU

SYSTEM_VERSION: System Version

BIOS_VENDOR: American Megatrends Inc.

BIOS_VERSION: 5013

BIOS_DATE: 03/22/2024

BASEBOARD_MANUFACTURER: ASUSTeK COMPUTER INC.

BASEBOARD_PRODUCT: TUF GAMING X570-PLUS (WI-FI)

BASEBOARD_VERSION: Rev X.0x

DUMP_TYPE: 2

BUGCHECK_P1: 0

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff80397740885

READ_ADDRESS: fffff803714fb390: Unable to get MiVisibleState
Unable to get NonPagedPoolStart
Unable to get NonPagedPoolEnd
Unable to get PagedPoolStart
Unable to get PagedPoolEnd
unable to get nt!MmSpecialPagesInUse
0000000000000000

CURRENT_IRQL: 2

FAULTING_IP:
nvlddmkm+c70885
fffff803`97740885 488b02 mov rax,qword ptr [rdx]

CPU_COUNT: c

CPU_MHZ: e6d

CPU_VENDOR: AuthenticAMD

CPU_FAMILY: 19

CPU_MODEL: 21

CPU_STEPPING: 2

CPU_MICROCODE: 19,21,2,0 (F,M,S,R) SIG: 0'00000000 (cache) 0'0A20120E (init)

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXPNP: 1 (!blackboxpnp)


CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: AV

PROCESS_NAME: vrserver.exe

ANALYSIS_SESSION_HOST: BATTLESTARGALAC

ANALYSIS_SESSION_TIME: 10-15-2024 14:29:21.0969

ANALYSIS_VERSION: 10.0.17763.132 x86fre

DPC_STACK_BASE: FFFFF80376C7DFB0

TRAP_FRAME: fffff80376c7d7e0 -- (.trap 0xfffff80376c7d7e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff80397740870 rbx=0000000000000000 rcx=ffffaa0eab689000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80397740885 rsp=fffff80376c7d970 rbp=ffffaa0eaf5edc50
r8=fffff80376c7d9d0 r9=ffffaa0eab689000 r10=00000000ffffffff
r11=ffffaa0e9dc7a000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nvlddmkm+0xc70885:
fffff803`97740885 488b02 mov rax,qword ptr [rdx] ds:00000000`00000000=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff80370c12ba9 to fffff80370bfe350

STACK_TEXT:
fffff803`76c7d698 fffff803`70c12ba9 : 00000000`0000000a 00000000`00000000 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff803`76c7d6a0 fffff803`70c0e578 : 00000009`dca7bf30 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff803`76c7d7e0 fffff803`97740885 : ffffbb80`b7dca1f0 fffff803`977aed28 00000000`00000000 ffffaa0e`9eaa5030 : nt!KiPageFault+0x478
fffff803`76c7d970 ffffbb80`b7dca1f0 : fffff803`977aed28 00000000`00000000 ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 : nvlddmkm+0xc70885
fffff803`76c7d978 fffff803`977aed28 : 00000000`00000000 ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 fffff803`8132d616 : 0xffffbb80`b7dca1f0
fffff803`76c7d980 00000000`00000000 : ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 fffff803`8132d616 00000000`0000d200 : nvlddmkm+0xcded28


THREAD_SHA1_HASH_MOD_FUNC: 43d66b695656ff6029e0896a0f38861e7138cae6

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 8999fe696a052cee8cd0fc208c8d2f91c0fb00b2

THREAD_SHA1_HASH_MOD: c5849c14d56648baae932b75ab8e18776e6ef2ce

FOLLOWUP_IP:
nvlddmkm+c70885
fffff803`97740885 488b02 mov rax,qword ptr [rdx]

FAULT_INSTR_CODE: 4c028b48

SYMBOL_STACK_INDEX: 3

SYMBOL_NAME: nvlddmkm+c70885

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 66f5b7c8

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: c70885

FAILURE_BUCKET_ID: AV_nvlddmkm!unknown_function

BUCKET_ID: AV_nvlddmkm!unknown_function

PRIMARY_PROBLEM_CLASS: AV_nvlddmkm!unknown_function

TARGET_TIME: 2024-10-07T19:15:13.000Z

OSBUILD: 19041

OSSERVICEPACK: 4894

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 1

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: 2003-12-26 14:27:08

BUILDDATESTAMP_STR: 191206-1406

BUILDLAB_STR: vb_release

BUILDOSVER_STR: 10.0.19041.1.amd64fre.vb_release.191206-1406

ANALYSIS_SESSION_ELAPSED_TIME: 8e44

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_nvlddmkm!unknown_function

FAILURE_ID_HASH: {7eea5677-f68d-2154-717e-887e07e55cd3}

Followup: MachineOwner
---------

nvlddmkm.sys points to a nvidia driver error, but I have used DDU to completely uninstall and reinstall drivers and still had crashes. Maybe I need to try going back to an older version of the drivers?

Screenshot_6.png
Screenshot_5.png
 
Last edited:

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,134 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
I ran memtest for the past 12 hours with the 64gb corsair vengence ram and had no errors. Afterwards, I found a WHEA error from 9/4/2024 in the Event Viewer (I’ve attached a screenshot for reference). The raw data for the error in case it's important is:

435045521002FFFFFFFF01000100000007000000CE010000063A0F00040918143C60C1835215A74887D114D9467D7765000000000000000000000000000000008D7C2157665EFB4480339B74CACEDF5B03F83300702E884E992C6F26DAF3DB7ACD20939F53F5DA01080000000000000000000000000000000000000000000000C8000000060100000003020001000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000053544F52504F525401000601A4000000010003000B000000E39793895020EE806989BDAF12C76F6E730074006F00720061006800630069000000000000000000000000000000000053616D73756E672000535344203837302051564F203454420000A4000000A40000007DE05002000000001600000000000100DD5CEC0000000000000000000000000000000000640000000000000000000000000000000000000063000000FFFFFFFF00000000000100000000000000000000010000000000000000000000000000000100000000000000630000000000000065000000640000000000000000000000000000006500000063000000630000001400000000000000B7D60200

I sorted the Event Viewer alphabetically by event type, as you suggested, and this was the only WHEA I found.

Thanks for the guidance—I'll keep digging into it. I have ordered a 5900x and isopropyl alcohol, and when it arrives, I will check the IHS as requested by Sarajiel to see what it says, and see if a new CPU fixes things. Now I am running y-cruncher again to see if I can get any WHEA errors. My only crash dump analyzed in WinDbg from 10/7 says the following
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff80397740885, address which referenced memory

Debugging Details:
------------------

*** WARNING: Unable to verify timestamp for nvlddmkm.sys

KEY_VALUES_STRING: 1


STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1


DUMP_CLASS: 1

DUMP_QUALIFIER: 400

BUILD_VERSION_STRING: 19041.1.amd64fre.vb_release.191206-1406

SYSTEM_MANUFACTURER: System manufacturer

SYSTEM_PRODUCT_NAME: System Product Name

SYSTEM_SKU: SKU

SYSTEM_VERSION: System Version

BIOS_VENDOR: American Megatrends Inc.

BIOS_VERSION: 5013

BIOS_DATE: 03/22/2024

BASEBOARD_MANUFACTURER: ASUSTeK COMPUTER INC.

BASEBOARD_PRODUCT: TUF GAMING X570-PLUS (WI-FI)

BASEBOARD_VERSION: Rev X.0x

DUMP_TYPE: 2

BUGCHECK_P1: 0

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff80397740885

READ_ADDRESS: fffff803714fb390: Unable to get MiVisibleState
Unable to get NonPagedPoolStart
Unable to get NonPagedPoolEnd
Unable to get PagedPoolStart
Unable to get PagedPoolEnd
unable to get nt!MmSpecialPagesInUse
0000000000000000

CURRENT_IRQL: 2

FAULTING_IP:
nvlddmkm+c70885
fffff803`97740885 488b02 mov rax,qword ptr [rdx]

CPU_COUNT: c

CPU_MHZ: e6d

CPU_VENDOR: AuthenticAMD

CPU_FAMILY: 19

CPU_MODEL: 21

CPU_STEPPING: 2

CPU_MICROCODE: 19,21,2,0 (F,M,S,R) SIG: 0'00000000 (cache) 0'0A20120E (init)

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXPNP: 1 (!blackboxpnp)


CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: AV

PROCESS_NAME: vrserver.exe

ANALYSIS_SESSION_HOST: BATTLESTARGALAC

ANALYSIS_SESSION_TIME: 10-15-2024 14:29:21.0969

ANALYSIS_VERSION: 10.0.17763.132 x86fre

DPC_STACK_BASE: FFFFF80376C7DFB0

TRAP_FRAME: fffff80376c7d7e0 -- (.trap 0xfffff80376c7d7e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffff80397740870 rbx=0000000000000000 rcx=ffffaa0eab689000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80397740885 rsp=fffff80376c7d970 rbp=ffffaa0eaf5edc50
r8=fffff80376c7d9d0 r9=ffffaa0eab689000 r10=00000000ffffffff
r11=ffffaa0e9dc7a000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nvlddmkm+0xc70885:
fffff803`97740885 488b02 mov rax,qword ptr [rdx] ds:00000000`00000000=????????????????
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff80370c12ba9 to fffff80370bfe350

STACK_TEXT:
fffff803`76c7d698 fffff803`70c12ba9 : 00000000`0000000a 00000000`00000000 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff803`76c7d6a0 fffff803`70c0e578 : 00000009`dca7bf30 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff803`76c7d7e0 fffff803`97740885 : ffffbb80`b7dca1f0 fffff803`977aed28 00000000`00000000 ffffaa0e`9eaa5030 : nt!KiPageFault+0x478
fffff803`76c7d970 ffffbb80`b7dca1f0 : fffff803`977aed28 00000000`00000000 ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 : nvlddmkm+0xc70885
fffff803`76c7d978 fffff803`977aed28 : 00000000`00000000 ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 fffff803`8132d616 : 0xffffbb80`b7dca1f0
fffff803`76c7d980 00000000`00000000 : ffffaa0e`9eaa5030 ffffaa0e`9eaa5180 fffff803`8132d616 00000000`0000d200 : nvlddmkm+0xcded28


THREAD_SHA1_HASH_MOD_FUNC: 43d66b695656ff6029e0896a0f38861e7138cae6

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 8999fe696a052cee8cd0fc208c8d2f91c0fb00b2

THREAD_SHA1_HASH_MOD: c5849c14d56648baae932b75ab8e18776e6ef2ce

FOLLOWUP_IP:
nvlddmkm+c70885
fffff803`97740885 488b02 mov rax,qword ptr [rdx]

FAULT_INSTR_CODE: 4c028b48

SYMBOL_STACK_INDEX: 3

SYMBOL_NAME: nvlddmkm+c70885

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 66f5b7c8

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: c70885

FAILURE_BUCKET_ID: AV_nvlddmkm!unknown_function

BUCKET_ID: AV_nvlddmkm!unknown_function

PRIMARY_PROBLEM_CLASS: AV_nvlddmkm!unknown_function

TARGET_TIME: 2024-10-07T19:15:13.000Z

OSBUILD: 19041

OSSERVICEPACK: 4894

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 1

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: 2003-12-26 14:27:08

BUILDDATESTAMP_STR: 191206-1406

BUILDLAB_STR: vb_release

BUILDOSVER_STR: 10.0.19041.1.amd64fre.vb_release.191206-1406

ANALYSIS_SESSION_ELAPSED_TIME: 8e44

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:av_nvlddmkm!unknown_function

FAILURE_ID_HASH: {7eea5677-f68d-2154-717e-887e07e55cd3}

Followup: MachineOwner
---------

nvlddmkm.sys points to a nvidia driver error, but I have used DDU to completely uninstall and reinstall drivers and still had crashes. Maybe I need to try going back to an older version of the drivers?

View attachment 367720View attachment 367719

Which of the 3 memtests???

If it was HCI free version, you need to concurrently open and run enough instances to fill close to all of your capacity to be effective.

Unfortunately Error events don't always have much in the way of useful details. See the screenshot I included previously for an example. This error from September doesn't look too related to your memory issues.

IRQL_NOT_LESS_OR_EQUAL is usually just simple memory instability, although an unhappy memory controller is also plausible.
 
Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000
Which of the 3 memtests???

If it was HCI free version, you need to concurrently open and run enough instances to fill close to all of your capacity to be effective.

Unfortunately Error events don't always have much in the way of useful details. See the screenshot I included previously for an example. This error from September doesn't look too related to your memory issues.

IRQL_NOT_LESS_OR_EQUAL is usually just simple memory instability, although an unhappy memory controller is also plausible.
Passmark Memtest86, I will try HCl too, I have all the time in the world to fix this
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
8,134 (2.37/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Passmark Memtest86, I will try HCl too, I have all the time in the world to fix this

I listed those 3 for a reason. MT86 and the like are borderline useless because they put next to no stress on uncore (Fabric and UMC). Testing DRAM stability isn't just about testing the sticks.
 
Joined
Oct 15, 2024
Messages
6 (0.17/day)
Processor AMD Ryzen 5 5600X 3.7 GHz 6-Core Processo
Motherboard Asus TUF GAMING X570-PLUS (WI-FI) ATX AM4
Cooling Cooler Master Hyper 212 RGB Black Edition
Memory Corsair Vengeance RGB Pro 32 GB (4 x 16 GB) DDR4-3600 CL18
Video Card(s) Gigabyte Vision OC GeForce RTX 3080 Ti 12 GB
Storage Samsung 970 Evo Plus 1 TB PCIe 3.0 NVME SSD
Case Phanteks Enthoo Pro ATX Full Tower Case
Power Supply Seasonic Focus GX-1000
New motherboard fixed issue today, I am going to RMA the old board to ASUS and see if they will fix it. This problem only showed up in the last few months after owning the board for almost 2 years. Thanks for the help.
 
Top