Monday, April 24th 2023

AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
AMD Ryzen 7000X3D processors are prone to irreversible physical damage if CPU overclocking is attempted at some of the higher VDDCR voltages (the main power domain for the CPU cores). A Redditor who goes by Speedrookie, attempted to overclock their Ryzen 7 7800X3D, leading to an irreversible failure. The motherboard socket and the processor's land-grid contacts, show signs of overheating damage caused by the contacts melting from too much current draw.
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
Sources:
Tom's Hardware 1, 2, Igor's Lab, Speedrookie (Reddit)
A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.
Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
258 Comments on AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters
This is dispite many extra fancy PR protection capabilities that this spacific mobo hac. That`s way the mobo was damaged as well, not just the CPU.
- Gigabyte mobo was able to stop the above as to not damage the CPU socket (as oppose to the Asus board). The CPU was still killed.
- It is not directly EXPO related, but turning EXPO on is one way of triggeingr of the CPU Suicide.
- The CPU death probabilityIt is very much silicon loterry depended- on weak silicon the probability is much higher. In the short term stronger silicon can cope with the too high voltage being feed but may give up at some point in the futere. Talk about 'slow cooking' of the CPU.
- Verious bugs (will be an additional video on that) in the bios were discoverd and wrong\un-related output of voltage is measured, above what is displayed and set in the bios screen.
- All in all, a rare case that coming from mix neglectance of both AMD and vendors.
- Do keep on choosing ZEN4 if it`s the right product for you, but Do double check and be yourself in control\monitoring over voltages.
- More tests are coming with in-depth insights.
- AMD RMA any CPU case, even if EXPO was applay. Mobo vendors RMA- yet to be seen...
My take- a cluster of many small things coming together to a catastrophic error. Like in Airplane investigation you see in National Geographic. This is the hidden cost of a new platform and with AM5 it appers there still a lot to irone out. Unless you are into some adventures, wait to ZEN5 if you can. ZEN4 is still a work in progress and I suspect more weird cases will show up as the root couse of the problem still hasent attended- capping the voltage is the easy immidiate (panic if you will) solustion.
7000X6D? Wow! o_O
Right... I've just updated my BIOS to version 1.82 whith the voltage limiter. Everything is exactly as it was before, 1.2 VSoC with EXPO, but my idle power consumption went from 20-22 W to 24-25 W and my idle CPU temp rose by about 5-6 °C, and I can't figure out why. So far, I'm not happy.
I concur with Steve here, the AM5 platform when it comes to AMD communicating with their motherboard vendors on what's safe and what's not safe has resulted in an epic clusterfuck. There's a lot of room for blame here on AMD and their board partners.
I look forward to their side piece that Steve suggested will be coming soon because as he said, the video that was released recently was getting to be too long. I'm also looking forward to the full failure analysis that apparently will be weeks away due to them having to send the chip and motherboard off to a third-party failure analysis lab.
As always, a big thumbs up to Steve@GN, great work!!
As for AMD/partners in crime... this level of imprudence is unacceptable. There's too much focus on tiny incremental advances to come out on top and less focus on stock fine-tuning. 7000-series and its efficiency was/is a big winner in my eyes but this mishap somewhat leaves a sour taste in my mouth. Although, optimistically speaking, i'm sure these concerns will be quickly resolved with upcoming mobo patches.,
I mean, if they don't reach every customer out there to update their BIOS dead CPU's and some dead boards will just pile up. Esp. the part about "slow degradation" would be frightening if I had one of these. Even if you update your BIOS, your chip could be already taken a hit. Which could be even the reason for the strange reported idle power spikes. I bet their BIOS'es look the same. :laugh: Full of typo's. Lower the SOC voltage manually and keep the lowest stable setting. :) Some folks reported they could drop the power consumption by up to 20W.
I can understand overclocking the chip itself and making it go faster than the base clock but memory? It's so damn easy to do these days that even the biggest PC building n00b will know how to enable it.
I imagine that things are going to be changing from here on out when it comes to warranties.
Asus clears up gamernexus SOC measurement FUD
auto translate english.
A pity youtube does not redact the past views, GN ran away with a big income using lazy research and too smug to apologise i betcha
ASUS definitely was in the wrong with how things were handled, but Steve could’ve been a lot more helpful about how they approached the content and recommendations to users.
His approach in this whole episode is to gain new subscribers than to help users. Look at how fast he bought the burnt CPU and how far he continues to milk this? Terrible person that masquerade as Tech Jesus.
Asus definitely fumbling with the PR too, i agree.