Monday, April 24th 2023

AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
AMD Ryzen 7000X3D processors are prone to irreversible physical damage if CPU overclocking is attempted at some of the higher VDDCR voltages (the main power domain for the CPU cores). A Redditor who goes by Speedrookie, attempted to overclock their Ryzen 7 7800X3D, leading to an irreversible failure. The motherboard socket and the processor's land-grid contacts, show signs of overheating damage caused by the contacts melting from too much current draw.
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
Sources:
Tom's Hardware 1, 2, Igor's Lab, Speedrookie (Reddit)
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
258 Comments on AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
Edit: He seems to think it's coming mostly from running AMD Expo - SoC voltage.
that would be very sad,bad and facepalmish
He did test 1.5v on the SoC for half n hour and his CPU didn't kill itself but he's waiting for GN for more detailed info on this matter.
Now I have to find a new mobo master for my future builds. MSI maybe...?
If your keen i would take the cpu out and have a look see for the so called “burn mark” on the cpu . If you cant see anything your probably ok and this could be due to just bad batches or maybe even mobo related. More info is needed at this time.
I've since disabled EXPO on my memory and from here on out, hope that nothing bad happened.
Setting EXPO for my 7700x (at launch) on an MSI X670 pushed SoC to 1.35v. I adjusted it back down to default (1.02v) and ran that way without issue until I sold it.
Edit: Posting from my phone.
I'd set EXPO, boot into windows and see what SoC shows using HWinfo. Then compare to what disabling EXPO shows.
*edit*
Found one of my old posts. Here's where it shows up on an MSI board in HWinfo
Would just be interesting to have a look see that's all.
It looks like VDDCR_VDD reached a maximum voltage of 1.377 volts while VDDCR_SOC only reached 1.245 volts.
A ran a Cinebench R23 test and the only change was that the VDDCR_SOC only reached 1.246 volts.
Same idea why 1.5V Vcore on Zen 2 and later is "safe" in the hands of the stock Precision Boost algorithm, but certainly unsafe once you manually punch in the same 1.5V in the static Vcore box.
Traditionally the only Zen CPUs that can actually run a shitload of current through SOC are the APUs. On AM5 you can also get close since it has an iGPU, but that involves OCing the shit out of the iGPU and running heavy 3D load on the iGPU, something derbauer definitely didn't do by running Cinebench. Through normal memory controller loads alone, the most you can hope for is maybe 20W-ish flowing through SOC.
Given the time he spends around these platforms I thought he'd know some basic fundamentals of Ryzen topology.
1.35V is pretty standard for an EXPO VSOC auto-rule and people have been one-click set-and-forget for months with no problems; I honestly doubt that's what's going on, probably something else in the firmware given Asus' statement and pulling all their old BIOSes.