Monday, April 24th 2023

AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters

AMD Ryzen 7000X3D processors are prone to irreversible physical damage if CPU overclocking is attempted at some of the higher VDDCR voltages (the main power domain for the CPU cores). A Redditor who goes by Speedrookie, attempted to overclock their Ryzen 7 7800X3D, leading to an irreversible failure. The motherboard socket and the processor's land-grid contacts, show signs of overheating damage caused by the contacts melting from too much current draw.

A Ryzen 7000X3D processor features a special CPU complex die (CCD) with stacked 3D Vertical Cache memory. This cache die is located in the central region over the CCD where its 32 MB on-die L3 cache is located, while the difference in Z-height of the stacked die is filled up by structural silicon, which sit over the regions of the CCD with the 8 "Zen 4" CPU cores. It stands to reason that besides having an inferior thermal transfer setup to conventional "Zen 4" CCDs (without the 3DV cache), the CCD itself has a higher power-draw at any given clock-speed than a conventional CCD (since it's also powering the L3D). This is the main reason why overclocking capabilities on the 7000X3D processors are almost non-existent, and the processor's power limits are generally lower than their regular Ryzen 7000X counterparts. Attempting to dial up voltage kicks up the perfect storm for these processors.
Igor's Lab posted a detailed analysis of the region of the Socket AM5 land-grid most susceptible to a burn-out in the above scenario. The central region of the LGA has 93 pins dedicated to the VDDCR power domain, dispersed in a mostly checkered pattern, toward the center of the land-grid. Igor isolated 6 of these VDDCR pins in particular, which are most prone to physical damage, as they are located in a region below the CCD that sees it sandwiched between the L3D (stacked 3D Vertical cache die), and the fiberglass substrate below. Apparently, AMD's thermal and electrical protection mechanisms aren't able to prevent a runaway overheating of the pins that causes the substrate to melt, deform, and bulge outward, resulting in irreversible damage to both the processor and the socket.

Meanwhile, AMD's motherboard partners are rushing to release UEFI BIOS updates for their entire lineups of motherboards, which enforce tighter limits on the VDDCR voltage. MSI is the first motherboard manufacturer with such updates. MSI, in a press statement, stated that it has redesigned automated overclocking for 7000X3D processors. "The BIOS now only supports negative offset voltage settings, which can reduce the CPU voltage only," the MSI statement to Tom's Hardware reads. "MSI Center also restricts any direct voltage and frequency adjustments, ensuring that the CPU won't be damaged due to over-voltage." On the other hand, the update introduces an automated overclocking feature called Enhanced Mode Boost, which optimizes PBO settings to improve boost frequency residency, without any manual voltage adjustments.
Sources: Tom's Hardware 1, 2, Igor's Lab, Speedrookie (Reddit)
Add your own comment

258 Comments on AMD Ryzen 7000X3D Processors Prone to Physical Damage with Voltage-assisted Overclocking, Motherboard Vendors Rush BIOS Updates with Voltage Limiters

#101
TTPPUU
tabascosauz1.35V is pretty standard for an EXPO VSOC auto-rule and people have been one-click set-and-forget for months with no problems;
how do you know there arent problems ?

der8auers CPU has the similar "marks" on it too, but it still works (for now)
Posted on Reply
#102
tabascosauz
TTPPUUhow do you know there arent problems ?

der8auers CPU has the similar "marks" on it too, but it still works (for now)
I said I doubt it's as simple as "1.35V default bad". The marks on derbauer's own CPU are in the area of CCD0, and nowhere near VDDCR_SOC pins. Not discounting it either.

AMD is notorious for half-assing AGESA firmware, and board vendors are notorious for half-assing implementing the firmware. Probably some bug in some combination of those two, or someone left out the safeguards.
Posted on Reply
#103
trparky
tabascosauzAMD is notorious for half-assing AGESA firmware, and board vendors are notorious for half-assing implementing the firmware. Probably some bug in some combination of those two, or someone left out the safeguards.
But I thought that the UEFI GUI is nothing more than a front-end to the actual code behind it. AMD writes the code, the board maker slaps a theme on top of it, and ships it out the door.

Anyways, according to Ryzen Master, even with EXPO mode enabled, VDDCR SOC is set at 1.25 on my system.
Posted on Reply
#104
Outback Bronze
tabascosauzThe problem with derbauer drawing conclusions from that "test" is that you actually need to be running current through the appropriate rail to see whether VSOC is safe for VDDCR_SOC. All Cinebench is doing is running current through VDDCR_CPU. I can just as easily set a ridiculous VSOC on my APUs and call it tentatively "safe" by running CPU-only loads for an hour.
Wouldn't he already have known this? Why didn't he run a more strenuous test and put load on the SoC rails then?

He did mention that "he thinks" this is a temperature sensor issue that's not working. The CPU he had in his hands (7900X) got so hot the IHS melted off from being soldered to the cpu die. Gees.

This was on a Gigabyte board too.

I really have no idea how widespread these issues are but I dont see too many TechPowerup forum members reporting too many problems with their respective cpu's.
Posted on Reply
#105
trparky
Outback BronzeI really have no idea how widespread these issues are but I dont see too many TechPowerup forum members reporting too many problems with their respective cpu's.
Me neither but I'm going to be keeping an eye on things until we know more about what's happening.
Posted on Reply
#106
tabascosauz
trparkyBut I thought that the UEFI GUI is nothing more than a front-end to the actual code behind it. AMD writes the code, the board maker slaps a theme on top of it, and ships it out the door.
No? Plenty of vendor-specific features in past years (dynamic OC switcher, KomboStrike), different vendor implementations of the various PBO menus, randomly removed/lobotomized VDDG controls on MSI..............how else do you think every few months some vendors have functional BIOSes on a given AGESA while others have BIOSes on same AGESA bugged to all hell or absolutely tanking performance?

The board makers are low effort for a different reason - basically just developing one BIOS for a new AGESA release, making minor changes here and there, and then just Ctrl+V it across their entire lineup. Problems with a specific board? Cross that bridge when you get to it.

Hence why I keep telling people to stop salivating after the newest AGESA like it's the year's new iPhone. If it ain't broke and doesn't have something new you absolutely need, don't fix it.
Outback BronzeWouldn't he already have known this? Why didn't he run a more strenuous test and put load on the SoC rails then?

He did mention that "he thinks" this is a temperature sensor issue that's not working. The CPU he had in his hands (7900X) got so hot the IHS melted off from being soldered to the cpu die. Gees.

This was on a Gigabyte board too.

I really have no idea how widespread these issues are but I dont see too many TechPowerup forum members reporting too many problems with their respective cpu's.
I think he did say he didn't want to kill the CPU lol. But he could've run some Prime95 large FFT, a memory heavy Ycruncher stress, or even some TM5 if he wanted to see more VDDCR_SOC power draw. For AM5 I feel like it's more obvious since VSOC is specifically focused on iGPU and UMC now, with most of Fabric being spun off into Vmisc (where it was previously all under SOC on AM4). It was just weird to me - manually set VSOC............in order to stress Vcore? lol

AMD prides themselves on extensive and comprehensive temperature monitoring with a lot of sensors (hundreds?) scattered throughout their CPUs and GPUs, so I feel like it would have to be a wider problem than just 1 malfunctioning temp sensor that Tctl/Tdie forgot to pick up.

Granted, to see the darkened pads, TPU AM5 users would all have to physically remove their CPUs for inspection on systems that worked just fine, but agreed, more investigation and less hysteria.
Posted on Reply
#107
Zubasa
Gmr_ChickI believe AMD did provide the correct tech data to the mobo manufacturers. The mobo manufacturers just don't give a shit. All they care about is that their board "won" in benchmarks/reviews.
TBH this is often the case, just like a few years back board makers sneaked in MCE on by default on Intel boards.
Posted on Reply
#108
trparky
tabascosauzTPU AM5 users would all have to physically remove their CPUs for inspection on systems that worked just fine
And that would be a complete pain in the ass.
Posted on Reply
#109
kapone32
trparkyAnd that would be a complete pain in the ass.
I am havung too much fun with mine to do that.
Posted on Reply
#110
MarsM4N
Motherboard manufactures clearly in damage control, all of them: Tracker thread for AM5 Bios updates with voltage restrictions (to prevent X3D's frying) (reddit)

So what we know: all motherboard manufactures are affected & you have to have EXPO enabled. Voltage & thermal protection also doesn't kick in to protect the CPU from frying. Neither AMD or the motherboard manufactures didn't pick up the issue prior release and you have a limited number of affected folks. So it's either some unordinary circumstances comming together (with EXPO enabled), or AMD overguesstimated "save voltages" and/or the voltage/thermal protection is bugged in every BIOS, which is what I take from ASUS's statement.

Wondering, if it's unsafe to run EXPO 6000 with the pre-installed BIOS'es & you likely run into the issue someday, how do you sort this out? :confused: There are tons of people out there who do not monitor tech websites & never do BIOS updates. It's not like you have auto BIOS updates or a popup in Windows reminding you to update if you don't want to end up with a fried CPU.
Posted on Reply
#111
Naito
Why_MeAfter reading about all the bugs that came with AM4 this latest news doesn't surprise me in the least. AMD is known for releasing unfinished products.
I've been running a 5800X3D since release that had been swapped into a system previously running a 3600X. No issue with either CPU. 5800X3D runs very well and has made for much smoother gameplay.

These are my first non-Intel chips since a Cyrix chip way back in the day. Traditionally was never a fan of AMD, but have been very impressed with their Zen cores that I'd strongly consider staying with AMD for my next build.

Aside from maybe undervolting, I don't see much reason to fiddle with components like CPUs and GPUs these days. From my experience, the binning and algorithms they use to reach boost clocks are generally good enough that tweaks bring very little to the table.

As for the people complaining that the manufacturer is to blame for this - if you play outside the manufacturer's specs, you're playing with fire. Only have yourself to blame. If the motherboard manufacturer encourages this without proper fail-safes, that's on them.
Posted on Reply
#112
Sifro
pressing onNot only just a board, absolutely no Asus products are sold so no GPUs or laptops/notebooks.
I made an account just to clarify. Mindfactory has not been offering Asus products for years now. This is not a recent development and has absolutely nothing to do with current events. I wrote an E-Mail in 2019 asking if and when they will return asus products but they never did. I attached a screenshot of the email (It is in german though)
Posted on Reply
#113
MarsM4N
SifroI made an account just to clarify. Mindfactory has not been offering Asus products for years now. This is not a recent development and has absolutely nothing to do with current events. I wrote an E-Mail in 2019 asking if and when they will return asus products but they never did. I attached a screenshot of the email (It is in german though)
They aren't letting anything out "why?". :laugh: Found threads dating back to 10th Jun 2019, zero official statement from either party.

So if they didn't sell any ASUS product since then there must have been big trouble in little China, lol.
Posted on Reply
#115
trparky
phanbueySome zoidrage:

(10) Highly speculative rambling about why Ryzen 7000 CPUs are dying. - YouTube
I'm watching that video now. I'm at the ten minute mark on the video and Buildzoid said that (whether you want to think of this as good or bad depends upon how you want to look at it) if this issue were to occur to your processor, there's a high chance that it would not be the kind of situation that gradually kills your processor. It would the kind of situation in which you'd know your processor was dead because in a fraction of a second, your processor would be toast. This isn't a gradual thing at all, at least as he says.

And knowing the kind of content that Buildzoid puts out, I'd have to have to say that there's a good chance that he knows what he's talking about.

Suffice it to say... if your processor isn't dead by now, you're in the clear.
Posted on Reply
#116
phanbuey
trparkyI'm watching that video now. I'm at the ten minute mark on the video and Buildzoid said that (whether you want to think of this as good or bad depends upon how you want to look at it) if this issue were to occur to your processor, there's a high chance that it would not be the kind of situation that gradually kills your processor. It would the kind of situation in which you'd know your processor was dead because in a fraction of a second, your processor would be toast. This isn't a gradual thing at all, at least as he says.
It's a pretty good thought process tbh, im sure we will learn more but this seems like a good take...

This and Nvidia's fire-bursting donglegate are the two most interesting hardware issues this year.
Posted on Reply
#117
trparky
There was a point where he said that a user on Reddit mentioned something about 2.6 volts on the SOC. Yeah, Buildzoid said that if your processor was actually delivered 2.6 volts your processor would quite simply be dead. It would be dead so fast that you wouldn't be able to see it even happen on your screen; it would be that fast.
Posted on Reply
#118
phanbuey
I didn't realize it was all boards that bubbling these.... Sounded like it was ASUS only initially.
Posted on Reply
#119
trparky
phanbueyI didn't realize it was all boards that bubbling these.... Sounded like it was ASUS only initially.
That's what I thought too since ASUS hasn't exactly had stellar quality as of late.
Posted on Reply
#120
Minus Infinity
Hard to fell sorry for those stupid enough to be OCing the cards.
Posted on Reply
#121
R0H1T
ZubasaTBH this is often the case, just like a few years back board makers sneaked in MCE on by default on Intel boards.
Intel wanted to win in benches, hence MCE ~ the board makers will not do anything to upset the bean counters over there.
Posted on Reply
#122
trparky
R0H1TIntel wanted to win in benches, hence MCE ~ the board makers will not do anything to upset the bean counters over there.
False advertising.
Posted on Reply
#123
Ferrum Master
I have felling it is all more simple.

During 1366 days, where actually was the first serious issues regarding RAM voltages and limitations, there was a general rule not to exceed uncore and core voltage delta for more than 0.5V.

Basically when doing under voltage for vcore you void some similar rule and simply fry the gates due to large voltage difference it is not designed to withstand. Basically it is vice versa.

The discussion about motherboard makers being retards, idiots, scammers is totally grounded here. I haven't had a board for years that did not violate some voltages when being set on AUTO. Everything is excessive high to prevent some RMA rates due to instabilities on stock. Same applies to XMP and EXPO. They should never have had the ability to toggle voltage settings, NEVER.

Also the BS regarding temperatures etc... it get short... it is a nail then, then the wires act as a tungsten bulb and simply fry the substrate. It is a common sight on GPU's when mosfets short out also.
Posted on Reply
#124
JustBenching
Why_MeThis thread is missing something ... @fevgatos @nguyen time to bring the heat ^^
It's already in flames due to the CPU overcooking itself:roll:
Posted on Reply
#125
Why_Me
fevgatosIt's already in flames due to the CPU overcooking in flames :roll:
I knew you would get the pun in my post ^^
Posted on Reply
Add your own comment
Dec 22nd, 2024 10:35 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts