Monday, April 24th 2023
AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
AMD Ryzen 7000X3D processors are prone to irreversible physical damage if CPU overclocking is attempted at some of the higher VDDCR voltages (the main power domain for the CPU cores). A Redditor who goes by Speedrookie, attempted to overclock their Ryzen 7 7800X3D, leading to an irreversible failure. The motherboard socket and the processor's land-grid contacts, show signs of overheating damage caused by the contacts melting from too much current draw.
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
Sources:
Tom's Hardware 1, 2, Igor's Lab, Speedrookie (Reddit)
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
258 Comments on AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
In my case when i enable EXPO the SOC vcore goes up from ~1V to 1.35V which is insanely high. This was on an MSI board with the "fixed" bios. Can't imagine if that value hits 1.5V how long until your system goes belly up. Maybe i need to watch Buildzoid's video :)
Which leads me to believe the ASUS statement is just to cover a wide net and their ass, in addition to pulling bioses with full voltage control mainly so end users don’t get the option to kill their CPU until they can actually confirm what’s causing this; as opposed to confirmation that it actually is VSOC.
Now we have a bunch of conflicting information and misinformation/panic with no conclusive testing. The next wave will be the loveable YouTube morons like Jay who are not technically inclined enough to make statements on such topics causing more panic through sensationalist videos.
to be frank, i "was" at some point considering splurging for a fresh DDR5 platform with a 7800X3D and one of the first pieces of news (brief overview of the marketing/reviewer material) which came to light was "no overclocking" "3D sensitive to higher voltages" "danger imminent". Actually even before NDA's were lifted there were speculations flying around over "no overclocking" for the X3D counterparts.
So what is being suggested here?
Are the boards out-of-the-box a threat to X3D chips?
Or, are the boards at stock safe for 3D-chips but BIOS-level options allow for peril-driven overclocking?
If the latter, i'm surprised someone would purchase a ~$500 CPU and not partially consider the DO's and DON'Ts. I guess at the same time i can understand why AMD/board partners should employ no-entry zones with locked OC features considering not everyone is going to bother with looking into matters for even partial awareness. But i can't shake off not seeking some level of know-how before being brave enough to play with sensitive settings at the BIOS-level.
One thing i'm a little confused about.... what happened to thermal throttling thresholds? Or any other automated counter measure to keep these CPUs safe from overheating/burning up. This seems like a bigger cockup we can definitely point our fingers towards AMD/B-partners. IMO, in 2023 we shouldn't have to worry about burning shit up unless extreme overclockers enable unlimited flexibility for whatever cause.
LGA1700 - issues with the socket bending and first-gen non-heterogenous cores requiring numerous bios fixes, patches, and scheduler re-writes to iron out all the problems.
AM5 - DDR5 stability issues galore, stupid motherboard price hikes, and now motherboard vendors breaking the rules to fry your chip.
Rocket Lake turned out okay, AMD will probably have most of these dumb issues ironed out by Zen5.
Memory stability issues and chips frying are a little worse though - I would say AM5 is sketchier than AM4, AM4 had some memory issues but it was dirt cheap and the MT performance was huge. You were really getting alot with that setup for the $ so having to fiddle with ram sort of felt ok. This time its insanely expensive, to use the 7950X3D and 7900X3D you need to perform 25 steps to get them to work most of the time, and also the motherboards might kill CPU :/.
Doesn't feel as great.
I'm also wandering is PBO enabled at stock or any other auto-OC setting? if yes, are these contributing factors to running the chip 6-feet under or is this issue purely related to additional tweak settings?
Modern CPUs can withstand you pulling off their cooler while they’re at load. There should be no casual settings in the bios that can kill your chip. If you get the ln2 over locker board and pump 2V through your chip and turn off the OVP then fine - you set it to fry and that’s what it did. Setting expo or pbo should never do that.
I've only gotten back into the new PC building game recently, have new platform launches always been this "interesting"? And if AMD keep making the news it's going to make it harder to playfully rib my Intel/Nvidia enthusiast workmates, sheesh.
At 12500 I use a two-piece cooler: the sole from the time of AM3 (between lga 1366 and 1700 there is only a 1mm difference in the holes) and a cheap and good cooler, but with a horrible grip and sent for recycling.
The results are ok (capture). Not perfect, but it's ok for me. The important thing is that the temperature of the hottest core dictates the behavior of the processor protections. The first to reach the critical temperature triggers the protection.
In the case of the AM5 socket, it is a much more serious problem. It seems to be a design error, the protections are useless and the risks are huge. You can burn a processor with overclocking, but not that fast.
If you put a 13900KS on the AM5 socket, you blow up the whole neighborhood. :rockout:
It could be false advertising for AMD to specifically tell reviewers to benchmark with EXPO enabled, when doing so could damage their CPUs (and motherboards) in mere 2-3 weeks.
In your example if FORD show in an advertisement that their cars can run submerged and when you do it, it kills the engine, that is false advertisement isn't it
Enabling EXPO automatically added .21v (1.245 - 1.035 = .21) for a maximum of 1.245 volts on the SOC. That's two tenths of a volt here that my motherboard added by default just by enabling EXPO.
What's really stupid here is that outside of benchmarks, turning off EXPO has no discernable change to how my system feels. I don't notice any kind of slowdown in how long it takes to boot Windows, there's no slowdowns in how web pages are rendered, the program that I compile from C++ source code using Microsoft's Visual C++ Compiler takes very nearly the same amount of time that it took with EXPO enabled, nor has it really affected Cinebench R23 scores (multi-core score of 19179).
So, I'm going to keep EXPO off for the time being until new UEFI versions come out. So far, Gigabyte hasn't come out with a new version for my board yet.