• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Freezing in games with new Radeon GPU, MCE-WHEA CPU Bus Error

Joined
Aug 1, 2024
Messages
35 (1.30/day)
Hello

It seems, to me that ever since I upgraded to a newer GPU, that pretty much all games (tho I don't play a lot like I used to) started freezing after just about 10-15 minutes.

One of the games is Fortnite, the other is a flight sim, two completely different kind of games.

Under the hood it may be a BSOD, because a bugcheck event gets created and an automatic memory dump of 2-3 GB gets written as "MEMORY.DMP" from which I've analyzed:

WHEA_UNCORRECTABLE_ERROR
Module Name: AuthenticAMD.sys

Further analysis says: Fatal BUS Error - BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)

I might post more detailed screenshots later, right now I'm not typing from the problematic machine.

This is what hints it could be a CPU problem, because I've actually upgraded to a new CPU a few months before the GPU, but I remember playing the same games with the newer CPU and the older GPU ...
Maybe a BIOS update coinciding the upgrade in the week around the upgrade of the GPU could be the blame, these motherboards and CPUs was known for USB connectivity issues plaguing a lot of users.

It's an AMD AM4 system, ASUS ROG Strix X570-E motherboard with AMD Ryzen 9 5900X and initially an older AMD Radeon GPU, one that I upgraded with Sapphire AMD Radeon RX 6700XT, around that time I started experiencing freezes in games, even though it would all work otherwise. The PSU is a higher tier platinum Corsair HX 750W.

I'm a senior advanced PC user despite this being the first post, I've lurked around here for quite a long time ;) So troubleshooting isn't a first for me, infact it's one of my cup of tea's back when I had more free time, I would dig deep into all kinds of issues, so there's a whole bunch of basics and stuff I've already tried. This issue's just a bit more mysterious and it's affecting my primary work setup, so I rather post it up for reference and get it documented for clarity and recall.

I haven't tried everything yet though, so far I've tried on a new installation of Win10, other games, disabled AMD DOCP and did some benchmarks in Cinebench. I did two 3 hour tests with Prime95, one for max CPU stress and one for RAM stress. I've yet to do OCCT, and rendering programs.

Development and workstation type programs without fully rendering do seem to work just fine though ... I wanted to test a full blown heavy Blender render, but for some reason official blender 4.2 installer seems to be broken right now, and won't launch due to missing DLLs, even though I installed it on another Win10 based AMD (AM5) system a week ago for someone else. Maybe something up with my current Win10 installation (which was a rough new fresh one done a few months back), basically Blender.exe can't find MSVCP140.dll inside the ..crt folder of the installation. I tried redownloading from another mirror and repairing the installation, but it did not help.

Now I have 2 other GPUs nearby that I can borrow and do some tests, a Radeon RX 7800X and a Nvidia ...1070, lastly I can switch back to my old AMD CPU (Ryzen 5 4650G APU) and try my new GPU with the old CPU.

It would be the best if just the GPU was at fault ... but if it's a compatability issue, I think a replacement won't help, I might have to totally replace for a different brand or a model of GPU.
 
Joined
Oct 22, 2014
Messages
13,770 (3.83/day)
Location
Sunshine Coast
System Name Lenovo ThinkCentre
Processor AMD 5650GE
Motherboard Lenovo
Memory 32 GB DDR4
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Lenovo
Software W11 Pro 64 bit
Did you update the Chipset driver at the same time as the Bios for the new CPU?
 
Joined
Jun 2, 2017
Messages
8,531 (3.23/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Hello

It seems, to me that ever since I upgraded to a newer GPU, that pretty much all games (tho I don't play a lot like I used to) started freezing after just about 10-15 minutes.

One of the games is Fortnite, the other is a flight sim, two completely different kind of games.

Under the hood it may be a BSOD, because a bugcheck event gets created and an automatic memory dump of 2-3 GB gets written as "MEMORY.DMP" from which I've analyzed:

WHEA_UNCORRECTABLE_ERROR
Module Name: AuthenticAMD.sys

Further analysis says: Fatal BUS Error - BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)

I might post more detailed screenshots later, right now I'm not typing from the problematic machine.

This is what hints it could be a CPU problem, because I've actually upgraded to a new CPU a few months before the GPU, but I remember playing the same games with the newer CPU and the older GPU ...
Maybe a BIOS update coinciding the upgrade in the week around the upgrade of the GPU could be the blame, these motherboards and CPUs was known for USB connectivity issues plaguing a lot of users.

It's an AMD AM4 system, ASUS ROG Strix X570-E motherboard with AMD Ryzen 9 5900X and initially an older AMD Radeon GPU, one that I upgraded with Sapphire AMD Radeon RX 6700XT, around that time I started experiencing freezes in games, even though it would all work otherwise. The PSU is a higher tier platinum Corsair HX 750W.

I'm a senior advanced PC user despite this being the first post, I've lurked around here for quite a long time ;) So troubleshooting isn't a first for me, infact it's one of my cup of tea's back when I had more free time, I would dig deep into all kinds of issues, so there's a whole bunch of basics and stuff I've already tried. This issue's just a bit more mysterious and it's affecting my primary work setup, so I rather post it up for reference and get it documented for clarity and recall.

I haven't tried everything yet though, so far I've tried on a new installation of Win10, other games, disabled AMD DOCP and did some benchmarks in Cinebench. I did two 3 hour tests with Prime95, one for max CPU stress and one for RAM stress. I've yet to do OCCT, and rendering programs.

Development and workstation type programs without fully rendering do seem to work just fine though ... I wanted to test a full blown heavy Blender render, but for some reason official blender 4.2 installer seems to be broken right now, and won't launch due to missing DLLs, even though I installed it on another Win10 based AMD (AM5) system a week ago for someone else. Maybe something up with my current Win10 installation (which was a rough new fresh one done a few months back), basically Blender.exe can't find MSVCP140.dll inside the ..crt folder of the installation. I tried redownloading from another mirror and repairing the installation, but it did not help.

Now I have 2 other GPUs nearby that I can borrow and do some tests, a Radeon RX 7800X and a Nvidia ...1070, lastly I can switch back to my old AMD CPU (Ryzen 5 4650G APU) and try my new GPU with the old CPU.

It would be the best if just the GPU was at fault ... but if it's a compatability issue, I think a replacement won't help, I might have to totally replace for a different brand or a model of GPU.
What RAM are you using?
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
Did you update the Chipset driver at the same time as the Bios for the new CPU?

Good point, I don't think I did ... however, the problem persists on a completely new Windows 10 installation done in January 2024 which is based on the January 2024 Update level. While the initial Windows 10 installation was done in June 2023 and based around I think May 2023 or earlier Update level. I usually hard disable updates after installation. So there's no way an update messed something up, or WU did any driver swaps or something else. At this time ... I can't confirm whether it was 2023 or 2022 for the "old" Win10 installation, I would have to check again .... but the old installation, actually, the freaking SSD got corrupted somehow just a few weeks ago. That's a whole other issue I haven't figured out at all yet.
Now I remember I upgraded the CPU in January 2023 and the GPU was in June 2023 ... Was I actually running on yet again older Windows 10 installation? I would need to re-check things again. I might haven't played any games during those months and perhaps it may be the CPU's fault.

So much has happened during these few years, a bit too much of back and forth and upgrading HW piece by piece, and I have quite a lot of things going on, including PC and maintenance for other people, I've built and setup many PCs in recent months, a bunch of other tech projects. I should have gathered all of this up offline before posting initially, this is a bit more complicated than I initially recalled.

Yeah I forgot the RAM, it's a 128GB kit of Kingston KF3600C18D4/32X - (SK Hynix)

I have to leave for a few hours right now, I was hoping I could do a heavy Blender render, but i'll just I'll let OCCT run a GPU benchmark test and see if that triggers a BSOD or freeze.
 
Joined
Jun 2, 2017
Messages
8,531 (3.23/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Good point, I don't think I did ... however, the problem persists on a completely new Windows 10 installation done in January 2024 which is based on the January 2024 Update level. While the initial Windows 10 installation was done in June 2023 and based around I think May 2023 or earlier Update level. I usually hard disable updates after installation. So there's no way an update messed something up, or WU did any driver swaps or something else. At this time ... I can't confirm whether it was 2023 or 2022 for the "old" Win10 installation, I would have to check again .... but the old installation, actually, the freaking SSD got corrupted somehow just a few weeks ago. That's a whole other issue I haven't figured out at all yet.
Now I remember I upgraded the CPU in January 2023 and the GPU was in June 2023 ... Was I actually running on yet again older Windows 10 installation? I would need to re-check things again. I might haven't played any games during those months and perhaps it may be the CPU's fault.

So much has happened during these few years, a bit too much of back and forth and upgrading HW piece by piece, and I have quite a lot of things going on, including PC and maintenance for other people, I've built and setup many PCs in recent months, a bunch of other tech projects. I should have gathered all of this up offline before posting initially, this is a bit more complicated than I initially recalled.

Yeah I forgot the RAM, it's a 128GB kit of Kingston KF3600C18D4/32X - (SK Hynix)

I have to leave for a few hours right now, I was hoping I could do a heavy Blender render, but i'll just I'll let OCCT run a GPU benchmark test and see if that triggers a BSOD or freeze.
At 128 GB I am assuming that all 4 slots are occupied. I remember when I had all 4 installed (64GB) I had no issues for about 4 months. Then one day I started getting WHEA errors and random shutdowns. After removing 2 sticks the problem went away. It may seem anecdotal but when you read the MB manual most actually recommend using 2 DIMS.
 
Joined
Jul 30, 2019
Messages
2,884 (1.55/day)
System Name Not a thread ripper but pretty good.
Processor Ryzen 9 5950x
Motherboard ASRock X570 Taichi (revision 1.06, BIOS/UEFI version P5.50)
Cooling EK-Quantum Velocity, EK-Quantum Reflection PC-O11, EK-CoolStream PE 360, XSPC TX360
Memory Micron DDR4-3200 ECC Unbuffered Memory (4 sticks, 128GB, 18ASF4G72AZ-3G2F1)
Video Card(s) XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage Samsung 2TB & 4TB 980 PRO, 2TB 970 EVO Plus, 2 x Optane 905p 1.5TB (striped), AMD Radeon RAMDisk
Display(s) 2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case Lian Li PC-O11 Dynamic Black (original model)
Power Supply Corsair RM750x
Mouse Logitech M575
Keyboard Corsair Strafe RGB MK.2
Software Windows 10 Professional (64bit)
Benchmark Scores Typical for non-overclocked CPU.
Yeah I forgot the RAM, it's a 128GB kit of Kingston KF3600C18D4/32X - (SK Hynix)
128GB can be tough to get working.

possible solutions
  • reduce ram to 64GB
  • try a different 128GB kit
  • tinker with voltages (SOC, CLDO VDDP, VDDG CCD, VDDG IOD, and DRAM Voltage)
  • tinker with ram speeds and/or timings
Having a DDR4-3600 kit run at 128GB sounds fantastic but that might be too much strain for the IMC without overriding other voltages.

Try simply reducing the speed to DDR-3200 in UEFI/BIOS and see if that gets you stable.

Also if you can, post a ZenTimings screenshot.
 
Last edited:
Joined
Aug 1, 2024
Messages
35 (1.30/day)
Hmm, it seems like OCCT doesn't have a GPU Benchmark anymore? I thought I saw one some time ago, before the major new version. Either way I'll ofcourse use other more dedicated programs.

Indeed I haven't familiarized and thinkered with the HW optimizations for the AM4 platform as much as I did for all my DIY PCs in the past, so I certainly am not aware of some of the peculiarities, but these days the industry moved to AM5, so I'm quite a bit behind. No worries, I'll catch up, I do follow tech/HW news in general still, and I had to get back into it big time during the unrelated builds, still there's probably details I'll be realizing for the first time during this troubleshoot.

I did check with the manual and stuff before going for 128GB ... infact I had 64GB before, and again during 2023 I upgraded to 128GB, I don't know when exactly ATM was that, it could have been a few weeks after the GPU swap ... yeez, the more I think about this the more I remember what happened.

I was infact planning for a full blown re-do, but I wasn't in a hurry until the SSD "failed" recently, ...

Right, I'll add that to the plan, testing with less RAM, absolutely! Even though I did MEM Tests quite a number of times, overnight, had no problems reported errors, with DOCP or not.
 
Last edited:
Joined
Aug 1, 2024
Messages
35 (1.30/day)
One of the previous bugchecks:

Fortnite
Code:
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffff9c87103a2028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000fc800800, High order 32-bits of the MCi_STATUS value.
Arg4: 00000000060c0859, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2046

    Key  : Analysis.Elapsed.mSec
    Value: 1968

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 1

    Key  : Analysis.IO.Write.Mb
    Value: 0

    Key  : Analysis.Init.CPU.mSec
    Value: 1733

    Key  : Analysis.Init.Elapsed.mSec
    Value: 16111

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 106

    Key  : Bugcheck.Code.KiBugCheckData
    Value: 0x124

    Key  : Bugcheck.Code.LegacyAPI
    Value: 0x124

    Key  : Failure.Bucket
    Value: 0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

    Key  : Failure.Hash
    Value: {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

    Key  : Hypervisor.Enlightenments.Value
    Value: 0

    Key  : Hypervisor.Enlightenments.ValueHex
    Value: 0

    Key  : Hypervisor.Flags.AnyHypervisorPresent
    Value: 0

    Key  : Hypervisor.Flags.ApicEnlightened
    Value: 0

    Key  : Hypervisor.Flags.ApicVirtualizationAvailable
    Value: 1

    Key  : Hypervisor.Flags.AsyncMemoryHint
    Value: 0

    Key  : Hypervisor.Flags.CoreSchedulerRequested
    Value: 0

    Key  : Hypervisor.Flags.CpuManager
    Value: 0

    Key  : Hypervisor.Flags.DeprecateAutoEoi
    Value: 0

    Key  : Hypervisor.Flags.DynamicCpuDisabled
    Value: 0

    Key  : Hypervisor.Flags.Epf
    Value: 0

    Key  : Hypervisor.Flags.ExtendedProcessorMasks
    Value: 0

    Key  : Hypervisor.Flags.HardwareMbecAvailable
    Value: 1

    Key  : Hypervisor.Flags.MaxBankNumber
    Value: 0

    Key  : Hypervisor.Flags.MemoryZeroingControl
    Value: 0

    Key  : Hypervisor.Flags.NoExtendedRangeFlush
    Value: 0

    Key  : Hypervisor.Flags.NoNonArchCoreSharing
    Value: 0

    Key  : Hypervisor.Flags.Phase0InitDone
    Value: 0

    Key  : Hypervisor.Flags.PowerSchedulerQos
    Value: 0

    Key  : Hypervisor.Flags.RootScheduler
    Value: 0

    Key  : Hypervisor.Flags.SynicAvailable
    Value: 0

    Key  : Hypervisor.Flags.UseQpcBias
    Value: 0

    Key  : Hypervisor.Flags.Value
    Value: 16908288

    Key  : Hypervisor.Flags.ValueHex
    Value: 1020000

    Key  : Hypervisor.Flags.VpAssistPage
    Value: 0

    Key  : Hypervisor.Flags.VsmAvailable
    Value: 0

    Key  : Hypervisor.RootFlags.AccessStats
    Value: 0

    Key  : Hypervisor.RootFlags.CrashdumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.CreateVirtualProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.DisableHyperthreading
    Value: 0

    Key  : Hypervisor.RootFlags.HostTimelineSync
    Value: 0

    Key  : Hypervisor.RootFlags.HypervisorDebuggingEnabled
    Value: 0

    Key  : Hypervisor.RootFlags.IsHyperV
    Value: 0

    Key  : Hypervisor.RootFlags.LivedumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.MapDeviceInterrupt
    Value: 0

    Key  : Hypervisor.RootFlags.MceEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.Nested
    Value: 0

    Key  : Hypervisor.RootFlags.StartLogicalProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.Value
    Value: 0

    Key  : Hypervisor.RootFlags.ValueHex
    Value: 0

    Key  : SecureKernel.HalpHvciEnabled
    Value: 0

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Version
    Value: 10.0.19041.1


BUGCHECK_CODE:  124

BUGCHECK_P1: 0

BUGCHECK_P2: ffff9c87103a2028

BUGCHECK_P3: fc800800

BUGCHECK_P4: 60c0859

FILE_IN_CAB:  MEMORY.DMP

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  FortniteClient-Win64-Shipping.exe

STACK_TEXT:
ffffdb01`253ef938 fffff805`614b97aa     : 00000000`00000124 00000000`00000000 ffff9c87`103a2028 00000000`fc800800 : nt!KeBugCheckEx
ffffdb01`253ef940 fffff805`5da315b0     : 00000000`00000000 ffff9c87`103a2028 ffff9c87`0bdfe8b0 ffff9c87`103a2028 : nt!HalBugCheckSystem+0xca
ffffdb01`253ef980 fffff805`615bbaae     : 00000000`00000000 ffffdb01`253efa29 ffff9c87`103a2028 ffff9c87`0bdfe8b0 : PSHED!PshedBugCheckSystem+0x10
ffffdb01`253ef9b0 fffff805`614bb0d1     : ffff9c87`17dbe900 ffff9c87`17dbe900 ffff9c87`0bdfe900 ffff9c87`0bdfe8b0 : nt!WheaReportHwError+0x46e
ffffdb01`253efa90 fffff805`614bb443     : 00000000`0000000a ffff9c87`0bdfe900 ffff9c87`0bdfe8b0 00000000`0000000a : nt!HalpMcaReportError+0xb1
ffffdb01`253efc00 fffff805`614bb320     : ffff9c87`0bcef730 00000000`00000000 ffffdb01`253efe00 00000000`00000000 : nt!HalpMceHandlerCore+0xef
ffffdb01`253efc50 fffff805`614ba865     : ffff9c87`0bcef730 ffffdb01`253efef0 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
ffffdb01`253efc90 fffff805`614bd025     : ffff9c87`0bcef730 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0xe9
ffffdb01`253efcc0 fffff805`61512be9     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x35
ffffdb01`253efcf0 fffff805`6140e57a     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
ffffdb01`253efd20 fffff805`6140e237     : 00000000`00000000 00000000`00000000 00007ff7`02411094 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffffdb01`253efe60 00007ff7`0020b250     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277
000000cd`c6f7c200 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ff7`0020b250


MODULE_NAME: AuthenticAMD

IMAGE_NAME:  AuthenticAMD.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

FAILURE_BUCKET_ID:  0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

Followup:     MachineOwner
---------

10: kd> !errrec
10: kd> !errrec ffff9c87103a2028
===============================================================================
Common Platform Error Record @ ffff9c87103a2028
-------------------------------------------------------------------------------
Record Id     : 01dad3750bb80e5c
Severity      : Fatal (1)
Length        : 936
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 7/12/2024 7:57:23 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ ffff9c87103a20a8
Section       @ ffff9c87103a2180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : BUS error
Operation     : Instruction Execute
Flags         : 0x00
Level         : 1
CPU Version   : 0x0000000000a20f12
Processor ID  : 0x000000000000000a

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ ffff9c87103a20f0
Section       @ ffff9c87103a2240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x000000000000000a
CPU Id        : 12 0f a2 00 00 08 18 0a - 0b 32 f8 7e ff fb 8b 17
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ ffff9c87103a2280

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ ffff9c87103a2138
Section       @ ffff9c87103a22c0
Offset        : 664
Length        : 272
Flags         : 0x00000000
Severity      : Fatal

Error         : BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)
  Status      : 0xfc800800060c0859
  Address     : 0x00000002fb9b1400
  Misc.       : 0xd01a0ffe00000000



Digital Combat Simulator:
Code:
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffffde84ff7c9028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000bc800800, High order 32-bits of the MCi_STATUS value.
Arg4: 00000000060c0859, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2156

    Key  : Analysis.Elapsed.mSec
    Value: 2135

    Key  : Analysis.IO.Other.Mb
    Value: 0

    Key  : Analysis.IO.Read.Mb
    Value: 1

    Key  : Analysis.IO.Write.Mb
    Value: 0

    Key  : Analysis.Init.CPU.mSec
    Value: 2593

    Key  : Analysis.Init.Elapsed.mSec
    Value: 108307

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 107

    Key  : Bugcheck.Code.KiBugCheckData
    Value: 0x124

    Key  : Bugcheck.Code.LegacyAPI
    Value: 0x124

    Key  : Failure.Bucket
    Value: 0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

    Key  : Failure.Hash
    Value: {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

    Key  : Hypervisor.Enlightenments.Value
    Value: 0

    Key  : Hypervisor.Enlightenments.ValueHex
    Value: 0

    Key  : Hypervisor.Flags.AnyHypervisorPresent
    Value: 0

    Key  : Hypervisor.Flags.ApicEnlightened
    Value: 0

    Key  : Hypervisor.Flags.ApicVirtualizationAvailable
    Value: 1

    Key  : Hypervisor.Flags.AsyncMemoryHint
    Value: 0

    Key  : Hypervisor.Flags.CoreSchedulerRequested
    Value: 0

    Key  : Hypervisor.Flags.CpuManager
    Value: 0

    Key  : Hypervisor.Flags.DeprecateAutoEoi
    Value: 0

    Key  : Hypervisor.Flags.DynamicCpuDisabled
    Value: 0

    Key  : Hypervisor.Flags.Epf
    Value: 0

    Key  : Hypervisor.Flags.ExtendedProcessorMasks
    Value: 0

    Key  : Hypervisor.Flags.HardwareMbecAvailable
    Value: 1

    Key  : Hypervisor.Flags.MaxBankNumber
    Value: 0

    Key  : Hypervisor.Flags.MemoryZeroingControl
    Value: 0

    Key  : Hypervisor.Flags.NoExtendedRangeFlush
    Value: 0

    Key  : Hypervisor.Flags.NoNonArchCoreSharing
    Value: 0

    Key  : Hypervisor.Flags.Phase0InitDone
    Value: 0

    Key  : Hypervisor.Flags.PowerSchedulerQos
    Value: 0

    Key  : Hypervisor.Flags.RootScheduler
    Value: 0

    Key  : Hypervisor.Flags.SynicAvailable
    Value: 0

    Key  : Hypervisor.Flags.UseQpcBias
    Value: 0

    Key  : Hypervisor.Flags.Value
    Value: 16908288

    Key  : Hypervisor.Flags.ValueHex
    Value: 1020000

    Key  : Hypervisor.Flags.VpAssistPage
    Value: 0

    Key  : Hypervisor.Flags.VsmAvailable
    Value: 0

    Key  : Hypervisor.RootFlags.AccessStats
    Value: 0

    Key  : Hypervisor.RootFlags.CrashdumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.CreateVirtualProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.DisableHyperthreading
    Value: 0

    Key  : Hypervisor.RootFlags.HostTimelineSync
    Value: 0

    Key  : Hypervisor.RootFlags.HypervisorDebuggingEnabled
    Value: 0

    Key  : Hypervisor.RootFlags.IsHyperV
    Value: 0

    Key  : Hypervisor.RootFlags.LivedumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.MapDeviceInterrupt
    Value: 0

    Key  : Hypervisor.RootFlags.MceEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.Nested
    Value: 0

    Key  : Hypervisor.RootFlags.StartLogicalProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.Value
    Value: 0

    Key  : Hypervisor.RootFlags.ValueHex
    Value: 0

    Key  : SecureKernel.HalpHvciEnabled
    Value: 0

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Version
    Value: 10.0.19041.1


BUGCHECK_CODE:  124

BUGCHECK_P1: 0

BUGCHECK_P2: ffffde84ff7c9028

BUGCHECK_P3: bc800800

BUGCHECK_P4: 60c0859

FILE_IN_CAB:  MEMORY (2).DMP

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  DCS.exe

STACK_TEXT:
ffff9001`9d1ef938 fffff801`27ab97aa     : 00000000`00000124 00000000`00000000 ffffde84`ff7c9028 00000000`bc800800 : nt!KeBugCheckEx
ffff9001`9d1ef940 fffff801`253115b0     : 00000000`00000000 ffffde84`ff7c9028 ffffde84`f18e88b0 ffffde84`ff7c9028 : nt!HalBugCheckSystem+0xca
ffff9001`9d1ef980 fffff801`27bbbaae     : 00000000`00000000 ffff9001`9d1efa29 ffffde84`ff7c9028 ffffde84`f18e88b0 : PSHED!PshedBugCheckSystem+0x10
ffff9001`9d1ef9b0 fffff801`27abb0d1     : ffffde84`f1846ac0 ffffde84`f1846ac0 ffffde84`f18e8900 ffffde84`f18e88b0 : nt!WheaReportHwError+0x46e
ffff9001`9d1efa90 fffff801`27abb443     : 00000000`0000000a ffffde84`f18e8900 ffffde84`f18e88b0 00000000`0000000a : nt!HalpMcaReportError+0xb1
ffff9001`9d1efc00 fffff801`27abb320     : ffffde84`f16c9730 00000000`00000000 ffff9001`9d1efe00 00000000`00000000 : nt!HalpMceHandlerCore+0xef
ffff9001`9d1efc50 fffff801`27aba865     : ffffde84`f16c9730 ffff9001`9d1efef0 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
ffff9001`9d1efc90 fffff801`27abd025     : ffffde84`f16c9730 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0xe9
ffff9001`9d1efcc0 fffff801`27b12be9     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x35
ffff9001`9d1efcf0 fffff801`27a0e57a     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
ffff9001`9d1efd20 fffff801`27a0e237     : 00000000`00000000 00000000`00000000 00000289`1c1d0108 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffff9001`9d1efe60 00007fff`0bc75a6a     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277
000000a1`92afef50 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`0bc75a6a


MODULE_NAME: AuthenticAMD

IMAGE_NAME:  AuthenticAMD.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

FAILURE_BUCKET_ID:  0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

Followup:     MachineOwner
---------

10: kd> !errrec ffffde84ff7c9028
===============================================================================
Common Platform Error Record @ ffffde84ff7c9028
-------------------------------------------------------------------------------
Record Id     : 01dad433e6bc643d
Severity      : Fatal (1)
Length        : 936
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 7/13/2024 11:33:59 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ ffffde84ff7c90a8
Section       @ ffffde84ff7c9180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : BUS error
Operation     : Instruction Execute
Flags         : 0x00
Level         : 1
CPU Version   : 0x0000000000a20f12
Processor ID  : 0x000000000000000a

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ ffffde84ff7c90f0
Section       @ ffffde84ff7c9240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x000000000000000a
CPU Id        : 12 0f a2 00 00 08 18 0a - 0b 32 f8 7e ff fb 8b 17
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ ffffde84ff7c9280

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ ffffde84ff7c9138
Section       @ ffffde84ff7c92c0
Offset        : 664
Length        : 272
Flags         : 0x00000000
Severity      : Fatal

Error         : BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)
  Status      : 0xbc800800060c0859
  Address     : 0x000000088aab4ac0
  Misc.       : 0xd01a0ffe00000000


Today I played Fortnite again and had a sudden hard reset - no memory dumps or minidumps were created.

Event viewer did log the occurence nontheless, as expected it's WHEA bugcheck again.

Code:
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          2. 08. 2024 10:13:32
Event ID:      18
Task Category: None
Level:         Error
Keywords:    
User:          LOCAL SERVICE
Computer:      ...
Description:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 10

The details view of this entry contains further information.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
    <EventID>18</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2024-08-02T08:13:32.8898343Z" />
    <EventRecordID>3181</EventRecordID>
    <Correlation ActivityID="{40a0dbb0-72b5-4076-b4ea-d3371ebdaebd}" />
    <Execution ProcessID="4004" ThreadID="4660" />
    <Channel>System</Channel>
    <Computer>redacted</Computer>
    <Security UserID="S-1-5-19" />
  </System>
  <EventData>
    <Data Name="ErrorSource">3</Data>
    <Data Name="ApicId">10</Data>
    <Data Name="MCABank">1</Data>
    <Data Name="MciStat">0xbc800800060c0859</Data>
    <Data Name="MciAddr">0xbc0c8ec0</Data>
    <Data Name="MciMisc">0xd01a0ffe00000000</Data>
    <Data Name="ErrorType">10</Data>
    <Data Name="TransactionType">256</Data>
    <Data Name="Participation">0</Data>
    <Data Name="RequestType">5</Data>
    <Data Name="MemorIO">2</Data>
    <Data Name="MemHierarchyLvl">1</Data>
    <Data Name="Timeout">0</Data>
    <Data Name="OperationType">256</Data>
    <Data Name="Channel">256</Data>
    <Data Name="Length">936</Data>
    <Data Name="RawData">ommited</Data>
  </EventData>
</Event>



And these are Zen Timings - Automatic under BIOS 5003 - no DOCP.
ZenTimings_Screenshot_2AUG2024-NoDOCP-Auto.png


First I'll do a GPU swap and if it happens again then the RAM theory is looking more likely.
 
Last edited:
Joined
Aug 1, 2024
Messages
35 (1.30/day)
Allright, some progress!

Steps: Uninstalled AMD GPU Drivers (January 2024), DDU Safe Mode Cleanup, Swapped GPU (RX6700XT for RX7800XT), Installed new AMD GPU Drivers (July 2024), Rebooted 3 times (kept doing something with pci device and "system settings changed" ... and it was until the 3rd reboot that SmartAccessMemory started showing as Enabled in AMD Software)

Fortnite test:
Freeze and BSOD in under 5 minutes, waited the amount for it to write a full +128GB memory dump, but for us it won't really make a difference as it didn't provide more key information than a regular 2-3GB automatic memory dump.

Code:
10: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: ffff8b8d4ffcf028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 00000000bc800800, High order 32-bits of the MCi_STATUS value.
Arg4: 00000000060c0859, Low order 32-bits of the MCi_STATUS value.

Debugging Details:
------------------

Unable to load image C:\EpicGamesLibrary-C\Games\Fortnite\FortniteGame\Binaries\Win64\FortniteClient-Win64-Shipping.exe, Win32 error 0n2
*** WARNING: Unable to verify checksum for FortniteClient-Win64-Shipping.exe

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2124

    Key  : Analysis.Elapsed.mSec
    Value: 10633

    Key  : Analysis.IO.Other.Mb
    Value: 5

    Key  : Analysis.IO.Read.Mb
    Value: 1

    Key  : Analysis.IO.Write.Mb
    Value: 16

    Key  : Analysis.Init.CPU.mSec
    Value: 3687

    Key  : Analysis.Init.Elapsed.mSec
    Value: 110896

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 124

    Key  : Bugcheck.Code.KiBugCheckData
    Value: 0x124

    Key  : Bugcheck.Code.LegacyAPI
    Value: 0x124

    Key  : Failure.Bucket
    Value: 0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

    Key  : Failure.Hash
    Value: {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

    Key  : Hypervisor.Enlightenments.Value
    Value: 0

    Key  : Hypervisor.Enlightenments.ValueHex
    Value: 0

    Key  : Hypervisor.Flags.AnyHypervisorPresent
    Value: 0

    Key  : Hypervisor.Flags.ApicEnlightened
    Value: 0

    Key  : Hypervisor.Flags.ApicVirtualizationAvailable
    Value: 1

    Key  : Hypervisor.Flags.AsyncMemoryHint
    Value: 0

    Key  : Hypervisor.Flags.CoreSchedulerRequested
    Value: 0

    Key  : Hypervisor.Flags.CpuManager
    Value: 0

    Key  : Hypervisor.Flags.DeprecateAutoEoi
    Value: 0

    Key  : Hypervisor.Flags.DynamicCpuDisabled
    Value: 0

    Key  : Hypervisor.Flags.Epf
    Value: 0

    Key  : Hypervisor.Flags.ExtendedProcessorMasks
    Value: 0

    Key  : Hypervisor.Flags.HardwareMbecAvailable
    Value: 1

    Key  : Hypervisor.Flags.MaxBankNumber
    Value: 0

    Key  : Hypervisor.Flags.MemoryZeroingControl
    Value: 0

    Key  : Hypervisor.Flags.NoExtendedRangeFlush
    Value: 0

    Key  : Hypervisor.Flags.NoNonArchCoreSharing
    Value: 0

    Key  : Hypervisor.Flags.Phase0InitDone
    Value: 0

    Key  : Hypervisor.Flags.PowerSchedulerQos
    Value: 0

    Key  : Hypervisor.Flags.RootScheduler
    Value: 0

    Key  : Hypervisor.Flags.SynicAvailable
    Value: 0

    Key  : Hypervisor.Flags.UseQpcBias
    Value: 0

    Key  : Hypervisor.Flags.Value
    Value: 16908288

    Key  : Hypervisor.Flags.ValueHex
    Value: 1020000

    Key  : Hypervisor.Flags.VpAssistPage
    Value: 0

    Key  : Hypervisor.Flags.VsmAvailable
    Value: 0

    Key  : Hypervisor.RootFlags.AccessStats
    Value: 0

    Key  : Hypervisor.RootFlags.CrashdumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.CreateVirtualProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.DisableHyperthreading
    Value: 0

    Key  : Hypervisor.RootFlags.HostTimelineSync
    Value: 0

    Key  : Hypervisor.RootFlags.HypervisorDebuggingEnabled
    Value: 0

    Key  : Hypervisor.RootFlags.IsHyperV
    Value: 0

    Key  : Hypervisor.RootFlags.LivedumpEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.MapDeviceInterrupt
    Value: 0

    Key  : Hypervisor.RootFlags.MceEnlightened
    Value: 0

    Key  : Hypervisor.RootFlags.Nested
    Value: 0

    Key  : Hypervisor.RootFlags.StartLogicalProcessor
    Value: 0

    Key  : Hypervisor.RootFlags.Value
    Value: 0

    Key  : Hypervisor.RootFlags.ValueHex
    Value: 0

    Key  : SecureKernel.HalpHvciEnabled
    Value: 0

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Version
    Value: 10.0.19041.1


BUGCHECK_CODE:  124

BUGCHECK_P1: 0

BUGCHECK_P2: ffff8b8d4ffcf028

BUGCHECK_P3: bc800800

BUGCHECK_P4: 60c0859

FILE_IN_CAB:  MEMORY.DMP

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  FortniteClient-Win64-Shipping.exe

STACK_TEXT:
ffffb201`f9fe1938 fffff806`742b97aa     : 00000000`00000124 00000000`00000000 ffff8b8d`4ffcf028 00000000`bc800800 : nt!KeBugCheckEx
ffffb201`f9fe1940 fffff806`703d15b0     : 00000000`00000000 ffff8b8d`4ffcf028 ffff8b8d`450f38b0 ffff8b8d`4ffcf028 : nt!HalBugCheckSystem+0xca
ffffb201`f9fe1980 fffff806`743bbaae     : 00000000`00000000 ffffb201`f9fe1a29 ffff8b8d`4ffcf028 ffff8b8d`450f38b0 : PSHED!PshedBugCheckSystem+0x10
ffffb201`f9fe19b0 fffff806`742bb0d1     : ffff8b8d`4fcc2900 ffff8b8d`4fcc2900 ffff8b8d`450f3900 ffff8b8d`450f38b0 : nt!WheaReportHwError+0x46e
ffffb201`f9fe1a90 fffff806`742bb443     : 00000000`0000000a ffff8b8d`450f3900 ffff8b8d`450f38b0 00000000`0000000a : nt!HalpMcaReportError+0xb1
ffffb201`f9fe1c00 fffff806`742bb320     : ffff8b8d`43eee730 00000000`00000000 ffffb201`f9fe1e00 00000000`00000000 : nt!HalpMceHandlerCore+0xef
ffffb201`f9fe1c50 fffff806`742ba865     : ffff8b8d`43eee730 ffffb201`f9fe1ef0 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
ffffb201`f9fe1c90 fffff806`742bd025     : ffff8b8d`43eee730 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0xe9
ffffb201`f9fe1cc0 fffff806`74312be9     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x35
ffffb201`f9fe1cf0 fffff806`7420e57a     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
ffffb201`f9fe1d20 fffff806`7420e237     : 00000000`00000000 00000000`00000000 00000195`2d6f6bb8 00000000`00000000 : nt!KxMcheckAbort+0x7a
ffffb201`f9fe1e60 00007ff7`e98c0fc0     : 00000195`2d6f6ba8 00007ff7`e8aaca37 00000195`3b088040 00007ff7`e8b3f65f : nt!KiMcheckAbort+0x277
00000085`65cff3a0 00000195`2d6f6ba8     : 00007ff7`e8aaca37 00000195`3b088040 00007ff7`e8b3f65f 00000000`00000000 : FortniteClient_Win64_Shipping+0x2270fc0
00000085`65cff3a8 00007ff7`e8aaca37     : 00000195`3b088040 00007ff7`e8b3f65f 00000000`00000000 00000195`2c788838 : 0x00000195`2d6f6ba8
00000085`65cff3b0 00000195`3b088040     : 00007ff7`e8b3f65f 00000000`00000000 00000195`2c788838 00000195`2d6f6ba8 : FortniteClient_Win64_Shipping+0x145ca37
00000085`65cff3b8 00007ff7`e8b3f65f     : 00000000`00000000 00000195`2c788838 00000195`2d6f6ba8 00007ff7`e8b3ea1b : 0x00000195`3b088040
00000085`65cff3c0 00000000`00000000     : 00000195`2c788838 00000195`2d6f6ba8 00007ff7`e8b3ea1b 00000000`00000000 : FortniteClient_Win64_Shipping+0x14ef65f


MODULE_NAME: AuthenticAMD

IMAGE_NAME:  AuthenticAMD.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

FAILURE_BUCKET_ID:  0x124_0_AuthenticAMD_PROCESSOR_BUS_L1_SRC_IRD_I_NOTIMEOUT_IMAGE_AuthenticAMD.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {9849fca8-6c5b-4c6a-ecca-8c5e153b17bd}

Followup:     MachineOwner
---------

10: kd> !errrec ffff8b8d4ffcf028
===============================================================================
Common Platform Error Record @ ffff8b8d4ffcf028
-------------------------------------------------------------------------------
Record Id     : 01dae4d7353b59f0
Severity      : Fatal (1)
Length        : 936
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 8/2/2024 12:56:55 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : Processor Generic
-------------------------------------------------------------------------------
Descriptor    @ ffff8b8d4ffcf0a8
Section       @ ffff8b8d4ffcf180
Offset        : 344
Length        : 192
Flags         : 0x00000001 Primary
Severity      : Fatal

Proc. Type    : x86/x64
Instr. Set    : x64
Error Type    : BUS error
Operation     : Instruction Execute
Flags         : 0x00
Level         : 1
CPU Version   : 0x0000000000a20f12
Processor ID  : 0x000000000000000a

===============================================================================
Section 1     : x86/x64 Processor Specific
-------------------------------------------------------------------------------
Descriptor    @ ffff8b8d4ffcf0f0
Section       @ ffff8b8d4ffcf240
Offset        : 536
Length        : 128
Flags         : 0x00000000
Severity      : Fatal

Local APIC Id : 0x000000000000000a
CPU Id        : 12 0f a2 00 00 08 18 0a - 0b 32 f8 7e ff fb 8b 17
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ ffff8b8d4ffcf280

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ ffff8b8d4ffcf138
Section       @ ffff8b8d4ffcf2c0
Offset        : 664
Length        : 272
Flags         : 0x00000000
Severity      : Fatal

Error         : BUSL1_SRC_IRD_I_NOTIMEOUT_ERR (Proc 10 Bank 1)
  Status      : 0xbc800800060c0859
  Address     : 0x000000029e520240
  Misc.       : 0xd01a0ffe00000000

So BIOS 5003 was released in 2023/10/31 by ASUS for this motherboard, and the AMD Chipset version is 5.08.02.027. The install date locally under Windows isn't the release date ofcourse, but I ofcourse installed what was the latest directly from AMD in late January 2024)

There's a new BIOS for AMD CPU vulnerabilities 5013, I suspect this might affect and lower performance, but that might actually help in this case if it affects RAM stuff, BIOS changelogs are notoriously poorly detailed, at least by ASUS, so there might be other benefitial fixes. I don't need to run top perf, but I'm kinda disappointed because I hope to not exchange amount of RAM for much lower performance. Hopefully in the end if I have to manually lower MTs, a little bit would do.

So I think what's next is the BIOS and Chipset update. Let me see if I'm lucky, but no high hopes ofcourse.

EDIT: Oh yeah I wanted to figure out the date of the chipset drivers, luckly AMD has all previous drivers listed below.
Facepalm: The BIOS 5003 was released after AMD Chipset drivers, which were on 2023-08-17. So if this mismatch is a big deal, it certainly doesn't help me here.

However, I might have updated BIOS after the WHEA issues started, this is the new January 2024 installation. The old broken installation is still bootable but has data corruption and broken GPU drivers, so not sure if it's worth the hassle swapping SSDs right now to check dates, I'll see if this BIOS-Chipset driver mismatch is the problem shortly with the new tests with the current latest Win10 installation.

UPDATE:

Oh well:
Code:
AMD Chipset Software Uninstall Summary

Name             : AMD PCI Device Driver
Version          : 1.0.0.90
Uninstall        : Fail

Name             : AMD Processor Power Management Support
Version          : 8.0.0.13
Uninstall        : Success

Name             : AMD SMBus Driver
Version          : 5.12.0.38
Uninstall        : Fail

Notoriously broken installers for these drivers, had problems in previous Windows setups, other PCs, etc.
 
Last edited:
Joined
Sep 3, 2019
Messages
3,255 (1.79/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 163W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F37h, AGESA V2 1.2.0.B
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~466W (366W current) PowerLimit, 1060mV, Adrenalin v24.7.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v23H2, OSBuild 22631.4037), upgraded from Win10 to Win11 on Feb 2024
What was the old GPU? What resolution are you using?
Any usage of PBO (advanced) and CurveOptimizer for the 5900X?
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
What was the old GPU? What resolution are you using?
Any usage of PBO (advanced) and CurveOptimizer for the 5900X?

Old GPU was RX480 and 1440p in all cases, old RAM was 64GB Crucial Ballistix ... they wouldn't work with DOCP so I replaced them completely because they also stopped producing them and 2 x 2x32GB Kit isn't the best idea either.

None of that, I never even read any online documentation about those advanced CPU overclocking options. I could have, I just didn't took the time or true necessity to get to it.

If I would dwelve into overclocking or rather optimization, it may be useful to switch between several profiles on a case by case basis depending on the task, if I'm setting off a big render task I might enable something, or if I'm gaming perhaps I might disable a few cores and boost the frequencies on the remaining ones, or whatever tricks are popular these days. I think that's the kind of stuff Ryzen Master application is meant for?

I do dev stuff a lot, but most of the coding and working is low strain because even if I do 3D stuff, you know, it's all preview-quality, FPS limited to 30, etc., however debuggers, profilers, that's where amount of RAM is a big deal and 64GB didn't seem enough in some cases (Windows Performance Recorder), however, not like I terribly need that huge amount as if it's mission critical, there's other reasons but the 128GB decision is just more convenience and peace-of-mind thing, avoiding pagefile trashing on SSD, and not having to pay attention too much for amount of browser tabs opened at once, in my case it was worth it and I was prepared to pay extra for it. It's working out great otherwise.

I just updated the BIOS and ofcourse it's all on auto right now, except stuff like enabling ReBAR support, disabling Wi-Fi, Fast Boot, Show POST until Press ESC (best admin/troubleshoot convenience option ever, I can't stress enough how many times I borked boots (bootables, dual boot, OS prober messing around, etc) when POST would be too quick because monitors can takes ages to handshake screen, or not even show it unless monitor is detected before POST OK, as I do a lot of switching between multiple PCs), and other non-performance things, which I don't need.
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
No luck.

Next time I lowered DRAM Frequency to 1866 MT.

Sort of better? Not quite. A flight sim was running for 3 hours all by it self, was still running when I got home. I rebooted normally and ran Fortnite, it froze with the same issue 15-20 minutes later, I was AFK so I don't know exactly when it happened.

Okay then lower the DRAM even more? That's ridicolous if it's what it takes to get 128GB running. This might not be a quick fix, I can see playing back and forth for days configuring this to find a sweet spot, but the whole thing could be broken by it's nature, who knows.

If you provide 4 slots, and if you make the densities of RAM, it ought to work together on basic settings. We're not talking about any kind of overclocking, but I think perhaps various overdrives and default boosts for the CPU are enabled if I just leave it's BIOS settings to "Auto"?
I might need to hard disable all of this stuff too? Hopefully not, what did I paid for then. What a mess.
 

Ruru

S.T.A.R.S.
Joined
Dec 16, 2012
Messages
11,956 (2.80/day)
Location
Jyväskylä, Finland
System Name 4K-gaming
Processor AMD Ryzen 7 5800X
Motherboard Asus ROG Crosshair VII Hero
Cooling Arctic Freezer 50
Memory 48GB Kingston DDR4-3200C16
Video Card(s) Asus GeForce RTX 3080 TUF OC 10GB
Storage ~3TB SSDs + 3TB external HDDs
Display(s) Acer 27" 4K120 IPS + Lenovo 32" 4K60 IPS
Case Corsair 4000D Airflow White
Audio Device(s) Asus TUF H3 Wireless
Power Supply EVGA Supernova G2 750W
Mouse Logitech MX518 + Asus TUF P1 mousepad
Keyboard Roccat Vulcan 121 AIMO
VR HMD Oculus Rift CV1
Software Windows 11 Pro
Benchmark Scores It runs Crysis
Incompatibility can always be possible, no matter what (and how good) the hardware is.

Going to DDR3 levels of RAM frequency isn't something which you should do, especially with Ryzen since they love memory bandwith.
 
Joined
Sep 3, 2019
Messages
3,255 (1.79/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 163W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F37h, AGESA V2 1.2.0.B
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~466W (366W current) PowerLimit, 1060mV, Adrenalin v24.7.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v23H2, OSBuild 22631.4037), upgraded from Win10 to Win11 on Feb 2024
There is a possibility that the PC always had some kind of incompatibility or even some flaw. When a GPU is upgraded usually the CPU gets more stressed on games. And going from RX480 to 6700XT is a big step up.

I remember going from a 5700XT to 7900XTX (300% upgrade) my 5900X has gone from 60-80W to 90-110W during games.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
41,139 (6.56/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
There is a possibility that the PC always had some kind of incompatibility or even some flaw. When a GPU is upgraded usually the CPU gets more stressed on games. And going from RX480 to 6700XT is a big step up.

I remember going from a 5700XT to 7900XTX (300% upgrade) my 5900X has gone from 60-80W to 90-110W during games.
Another person had a nvme drive not fully locked down, found a screw loose and was causing problems. I would reseat everything and make sure there are no screws loose or able to fall in the case and get between components and chassis to cause problems down the line.
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
Apex Legends worked for 20-30 mins right now in Training and Firing Range. Graphics maxed out, but I was just mostly setting up controls and firing few rounds of target practice.

I do limit FPS to 75 manually (whether vsync is on or not) in most games where they have an option (not driver enforced though), which is the max refresh rate of my monitor.

I'll go an play in a proper match in Apex, I'll go for an hour or so.
If it won't WHEAck it self, then I'll go and make another double-check test with Fortnite with DRAM F. still at 1866 MT. I expect it to freeze.

Then I'll up the DRAM Freq to DOCP and try Apex Legends again.


Incompatibility can always be possible, no matter what (and how good) the hardware is.

Going to DDR3 levels of RAM frequency isn't something which you should do, especially with Ryzen since they love memory bandwith.

Yeah, but for the tests, I'll go down a bit more even, and see what happens.


There is a possibility that the PC always had some kind of incompatibility or even some flaw. When a GPU is upgraded usually the CPU gets more stressed on games. And going from RX480 to 6700XT is a big step up.

I remember going from a 5700XT to 7900XTX (300% upgrade) my 5900X has gone from 60-80W to 90-110W during games.

Right, that did cross my mind as well, with both a beefier CPU and GPU and RAM, it's naturally more strain on the motherboard ... across the board, heh.

I guess one of the steps would be to try the old GPU again, exactly, thanks for reminding me ... but first I'll try with half amount of RAM, 64GB, because swapping a GPU takes more effort. Yes I'll keep in mind to populate the 2 recommended slots as described in the manual.

Another person had a nvme drive not fully locked down, found a screw loose and was causing problems. I would reseat everything and make sure there are no screws loose or able to fall in the case and get between components and chassis to cause problems down the line.

Indeed I thought about about re-doing the PC a little bit, but I do maintain in regularly, clean dust with compressed air, so it's mostly well kept. I'll do exactly that when this is solved and I'm read to reinstall Windows cleanly again.

About the SSDs, I didn't reinstall Windows on the same SSD (which got corrupt due to apparently an unrelated issue), the new Win10 installation is on a different (and newer model) SSD, Samsung 980 PRO 2TB instead of the Samsung 970 EVO 1TB.

I don't have the older Win10 SSD/Installation inside the computer anymore, ever since I noticed data corruption going on.
 
Joined
Sep 3, 2019
Messages
3,255 (1.79/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 163W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F37h, AGESA V2 1.2.0.B
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~466W (366W current) PowerLimit, 1060mV, Adrenalin v24.7.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v23H2, OSBuild 22631.4037), upgraded from Win10 to Win11 on Feb 2024
Personally Im trying to avoid 4 sticks of DRAM. In many cases this configuration causes issues. Yes 2 dimms require to be installed on A2/B2 slots.
On my next upgrade/platform most likely I'll go with 2x24GB sticks from 2x16GB.
And depending the game RAM access can go crazy especially when the CPU load has gone up with the help of the better GPU.
The "integrated" memory controller on the CPU usually is not happy with 4sticks and high memory load/access.
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
Personally Im trying to avoid 4 sticks of DRAM. In many cases this configuration causes issues. Yes 2 dimms require to be installed on A2/B2 slots.
On my next upgrade/platform most likely I'll go with 2x24GB sticks from 2x16GB.
And depending the game RAM access can go crazy especially when the CPU load has gone up with the help of the better GPU.
The "integrated" memory controller on the CPU usually is not happy with 4sticks and high memory load/access.

If I can't solve 128GB, I'm going to RMA the CPU or what, because that kind of "i'm not happy with 4 sticks" is really inexcusable, that's like a design flaw to me.
It could be a bad core perhaps ... I can see in WinDBG with additional info, Core ID is always "10" (11th), I'll later get into disabling all boosts for CPU, and/or disable that core.

+12V Voltages are kinda low, 11.8 idle, if motherboard's sensor is to be taken seriously.

And then I came across the power micro-spikes ... that might be a valid theory. The old RX480 GPU probably didn't had that, and I had a 4650G CPU back then.
It could be the PSU then, Hmmm, need to check on that micro-spikes support.
 
Last edited:

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
41,139 (6.56/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
If I can't solve 128GB, I'm going to RMA the CPU or what, because that kind of "i'm not happy with 4 sticks" is really inexcusable, that's like a design flaw to me.
It could be a bad core perhaps ... I can see in WinDBG with additional info, Core ID is always "10" (11th), I'll later get into disabling all boosts for CPU, and/or disable that core.

+12V Voltages are kinda low, 11.8 idle, if motherboard's sensor is to be taken seriously.

And then I came across the power micro-spikes ... that might be a valid theory. The old RX480 GPU probably didn't had that, and I had a 4650G CPU back then.
It could be the PSU then, Hmmm, need to check on that micro-spikes support.
If you read your motherboard manual it will specify what speeds can be obtained with 2 and 4+ modules. Just example 2 Modules you can run 3000, 4 modules 2400, anything beyond that is not tested and you are on your own to figure that out...
 
Joined
Aug 1, 2024
Messages
35 (1.30/day)
If you read your motherboard manual it will specify what speeds can be obtained with 2 and 4+ modules. Just example 2 Modules you can run 3000, 4 modules 2400, anything beyond that is not tested and you are on your own to figure that out...
3rd Gen AMD Ryzen™ Processors
- 4 x DIMM, max.128GB,DDR4 4400(O.C.)/4266(O.C.)/4133(O.C.)/400
0(O.C.)/3866(O.C.)/3733(O.C.)/3600(O.C.)/3466(O.C.)/3400(O.C.)/3
200(O.C.)/3000(O.C.)/2933(O.C.)/2800(O.C.)/2666/2400/2133 MHz,
un-buffered memory

Two sticks, 64GB, did not help.

Motherboard reports terrible voltages:
+12 VDC = 11.8
+5 VDC = 4.6
+3.3 VDC = 3.01

I played a 20 minute online match in Apex Legends right now, didn't froze yet. Apex is ofcourse running on a different engine than Fortnite so that wouldn't be surprising, it's straining the system differently.
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
7,988 (2.38/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
Two sticks, 64GB, did not help.

Motherboard reports terrible voltages:
+12 VDC = 11.8
+5 VDC = 4.6
+3.3 VDC = 3.01

I played a 20 minute online match in Apex Legends right now, didn't froze yet. Apex is ofcourse running on a different engine than Fortnite so that wouldn't be surprising, it's straining the system differently.

But do you have a Zentimings screenshot for showing the settings you were actually running 3600 at 128GB? You did show a 2400 screenshot but 1) I'm assuming you aren't happy with leaving it there 2) I don't really have a reference for what good VSOC looks like at that speed as few people run it. None of the timings and GDM settings shown at 2400 are relevant as they will be completely different at the higher XMP speed.

Quad rank per channel memory is extremely stressful for the memory controller. You'd be pretty lucky to get 3600 on quad rank, but it will probably require higher than normal VSOC (compared to dual rank).

VSOC requirements are generally a little higher already (and achievable Fabric/memory speeds lower) as inherent in 2CCD CPUs like the 5900X.

It's probably best not to leave VSOC, VDDG_CCD and VDDG_IOD to the motherboard to set auto-rules. I don't think it's going to be smart enough to bump those voltages to accommodate quad rank load on the UMC, it mostly likely just sets them based on speed.

VSOC can go up to 1.2V without much issue, but you'll run into diminishing returns past about 1.15V or so. VDDG_IOD is usually the other important one to pay attention to if you are getting Bus/Interconnect (which you are), 50 millivolts below VSOC is generally okay for IOD.

DCS is a RAM eater so it's probably the better practical test out of those games :D but please memtest properly with Testmem5, HCI Memtest, or Karhu.
 
Joined
Sep 3, 2019
Messages
3,255 (1.79/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 163W PPT limit, 80C temp limit, CO -8~12
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F37h, AGESA V2 1.2.0.B
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~466W (366W current) PowerLimit, 1060mV, Adrenalin v24.7.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v23H2, OSBuild 22631.4037), upgraded from Win10 to Win11 on Feb 2024
Board sensors are always a bit off but shouldn't be that far off.

Here is a screenshot after playing 1h 50min and I have boxed the board readings and from PSU it self
I have the (i) version of HX750 and its avg load during this session was 500W (output)

Untitled_121b.png
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
41,139 (6.56/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Is this the motherboard you have?

From what I read anything beyond 2666 is counted as DOCP/XMP

This is a baseline of what will run in 4 dimm configurations
 
Last edited:
Joined
Aug 1, 2024
Messages
35 (1.30/day)
So after one more try with Fortnite, and a WHEA crash. No crashes in Apex Legends (30 min max). I was able to borrow a PSU, a Seasonic Vertex ...something.. I think 850W, that was bought new 6 months ago.

And it's a major difference in reported voltages. Before I switched the PSU, I made a short log with HWiNFO so I'll compare some of those other voltages for CPU and Mobo which I didn't pay attention that much yet, as I still have no deep familiarity what is nominal, auto OC, or extreme, etc. That's the next learning curve.

IMG_20240804_103635_Old-Corsair-HX750.jpg

IMG_20240804_223336_New-Seasonic-Vertex-850W.jpg

But do you have a Zentimings screenshot for showing the settings you were actually running 3600 at 128GB? You did show a 2400 screenshot but 1) I'm assuming you aren't happy with leaving it there 2) I don't really have a reference for what good VSOC looks like at that speed as few people run it. None of the timings and GDM settings shown at 2400 are relevant as they will be completely different at the higher XMP speed.

Quad rank per channel memory is extremely stressful for the memory controller. You'd be pretty lucky to get 3600 on quad rank, but it will probably require higher than normal VSOC (compared to dual rank).

VSOC requirements are generally a little higher already (and achievable Fabric/memory speeds lower) as inherent in 2CCD CPUs like the 5900X.

It's probably best not to leave VSOC, VDDG_CCD and VDDG_IOD to the motherboard to set auto-rules. I don't think it's going to be smart enough to bump those voltages to accommodate quad rank load on the UMC, it mostly likely just sets them based on speed.

VSOC can go up to 1.2V without much issue, but you'll run into diminishing returns past about 1.15V or so. VDDG_IOD is usually the other important one to pay attention to if you are getting Bus/Interconnect (which you are), 50 millivolts below VSOC is generally okay for IOD.

DCS is a RAM eater so it's probably the better practical test out of those games :D but please memtest properly with Testmem5, HCI Memtest, or Karhu.

I don't think I ran at 3600MT ever since the first WHEA happened a year ago (yeah ... time flies)

I mean, I did consult with the QVL and all ... but I honestly didn't go that deeply into it, I didn't know, I was hoping in these "modern times" such things to be more fleshed out and robust.

I did watch a bunch of deep-dive videos anyhow just from normally following tech stuff, so I had the basic idea, how DDR4 compares to DDR5, the on-die ECC, some stuff around how AMD CPUs work, I was aware of the whole X3D stuff and having to turn off half of the CPU (one die) to not let threads jump (context switch) across, etc, but quite a bit what you said there is new to me.

Now I do remember, about ranks and stuff, the higher density DIMMS ... it all makes sense ... but this really isn't communicated by the industry IMO or indicated in manuals.

Interestingly, Fortnite crashes the fastest, while DCS didn't for a couple of hours last time I tried, and I didn't got over to try again.



Board sensors are always a bit off but shouldn't be that far off.

Here is a screenshot after playing 1h 50min and I have boxed the board readings and from PSU it self
I have the (i) version of HX750 and its avg load during this session was 500W (output)

View attachment 357457

Thanks for double checking for confirmation. I think it's enough to make a decision right here regarding the PSU, ordering a new one ASAP, even if it might not necessairly be the culprit. If it's still under warranty, I might RMA it and perhaps keep it for another non-gaming PC, or sell it. It's old enough for it to not properly support transient spikes anyway, AFAIK.


Is this the motherboard you have?

From what I read anything beyond 2666 is counted as DOCP/XMP

This is a baseline of what will run in 4 dimm configurations

Sure, but I thought I was fine there being a bunch of G.Skill at 3600 and 32GB densities being tested, supporting 4 DIMMS, so what I'm doing shouldn't be anything out of the ordinary, but I guess I actually

QVL thing's a bit confusing to me still either way ... what's with the 8x8GB ... is the QVL not for this specific board only. Or they get a 8 DIMM kit an only use half???
 

tabascosauz

Moderator
Supporter
Staff member
Joined
Jun 24, 2015
Messages
7,988 (2.38/day)
Location
Western Canada
System Name ab┃ob
Processor 7800X3D┃5800X3D
Motherboard B650E PG-ITX┃X570 Impact
Cooling NH-U12A + T30┃AXP120-x67
Memory 64GB 6400CL32┃32GB 3600CL14
Video Card(s) RTX 4070 Ti Eagle┃RTX A2000
Storage 8TB of SSDs┃1TB SN550
Case Caselabs S3┃Lazer3D HT5
So after one more try with Fortnite, and a WHEA crash. No crashes in Apex Legends (30 min max). I was able to borrow a PSU, a Seasonic Vertex ...something.. I think 850W, that was bought new 6 months ago.

And it's a major difference in reported voltages. Before I switched the PSU, I made a short log with HWiNFO so I'll compare some of those other voltages for CPU and Mobo which I didn't pay attention that much yet, as I still have no deep familiarity what is nominal, auto OC, or extreme, etc. That's the next learning curve.

I don't think I ran at 3600MT ever since the first WHEA happened a year ago (yeah ... time flies)

I mean, I did consult with the QVL and all ... but I honestly didn't go that deeply into it, I didn't know, I was hoping in these "modern times" such things to be more fleshed out and robust.

I did watch a bunch of deep-dive videos anyhow just from normally following tech stuff, so I had the basic idea, how DDR4 compares to DDR5, the on-die ECC, some stuff around how AMD CPUs work, I was aware of the whole X3D stuff and having to turn off half of the CPU (one die) to not let threads jump (context switch) across, etc, but quite a bit what you said there is new to me.

Now I do remember, about ranks and stuff, the higher density DIMMS ... it all makes sense ... but this really isn't communicated by the industry IMO or indicated in manuals.

Interestingly, Fortnite crashes the fastest, while DCS didn't for a couple of hours last time I tried, and I didn't got over to try again.

Sure, but I thought I was fine there being a bunch of G.Skill at 3600 and 32GB densities being tested, supporting 4 DIMMS, so what I'm doing shouldn't be anything out of the ordinary, but I guess I actually

QVL thing's a bit confusing to me still either way ... what's with the 8x8GB ... is the QVL not for this specific board only. Or they get a 8 DIMM kit an only use half???

WHEA 19 bus/interconnect should have nothing to do with your PSU. If it's a relatively aging PSU it might be good to rule it out just as good practice, but otherwise HX750 is a fine unit.
Software reported rail voltages without a digitally controlled and reporting PSU don't mean much. If they are grossly out of spec, then maybe could take a look at it.

QVL isn't a guarantee. At best it's a very vague suggestion as to the kinds of speeds you can expect out of a given board's mem topology (and even then only from board vendors that test more thoroughly and detail x DIMM/rank per channel speeds like MSI).

Throughout AM4's time, some board vendors have been really lazy and pretty much copy pasting QVLs across boards that are expected to perform similarly based on design similarities. So that gives you an idea of about how useful QVLs are these days.

"G.skill at 3600 and 32GB densities" doesn't mean a whole lot. Like all other RAM makers they source all sorts of ICs for their products. A single product SKU for RAM can spawn half a dozen different possible ICs all with wildly different performance and characteristics - especially if it's a highly mediocre DDR4 bin like 3600CL18 that isn't particularly difficult to hit, or itself specifically indicative of a certain IC.

If you are able to get to your sticks and can take a picture of the barcode sticker on them, snap a picture so you can see the 042 code and get a better idea of what you're actually working with.

So after one more try with Fortnite, and a WHEA crash. No crashes in Apex Legends (30 min max). I was able to borrow a PSU, a Seasonic Vertex ...something.. I think 850W, that was bought new 6 months ago.

I will say, though, that if your WHEAs are
  • consistently happening
  • always WHEA 19 Bus/Interconnect and never with WHEA 18 Cache Hierarchy in the mix
then the problem is coming from uncore components on your CPU, and there's not much else to say. ie. either any part of the Infinity Fabric interconnect, or the memory controller. I guess it could be the fault of the board, but it's a bit of a remote chance - it's not WHEA 18 which can often be due to the board running a bad voltage curve on the CPU.

WHEA 19 is pretty much always an issue of mismatched voltages (VSOC, VDDGs, VDDP(?), PLL(?)) for what you are asking of the memory controller/IF, or your RAM config is just asking too much of your specific CPU sample's memory controller/IF. Proven bad CPUs (that just have weak Fabric and throw WHEA 19s at relatively low mem/Fabric speeds) stay bad when slotted into different boards.
 
Last edited:
Top