Tuesday, July 23rd 2024

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Long-term reliability issues continue to plague Intel's 13th Gen and 14th Gen Core desktop processors based on the "Raptor Lake" microarchitecture, with users complaining that their processors have become unstable with heavy processing workloads, such as games. This includes the chips that have minor levels of performance tuning or overclocking. Intel had earlier isolated many of these stability issues to faulty CPU core frequency boosting algorithms, which it addressed through updates to the processor microcode that it got motherboard- and prebuilt manufacturers to distribute as UEFI firmware updates. The company has now come out with new findings of what could be causing these issues.

In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."
Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.

Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance, the company stated.
It's important to note here, that the microcode update won't fix the issues on processors already experiencing instability, but prevent it on chips that aren't. The instability is caused by irreversible physical degradation of the chip. These chips will, of course, be covered under warranty.

Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue, the company stated.
If you feel your chip might be affected, you can file for an RMA.
Sources: Intel Community, Intel (Reddit)
Add your own comment

387 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

#1
Jism
They wanted to release the fastest possible chips to be number one in charts, at the (expensive) expense of CPU durability. No excuse for this.
Posted on Reply
#2
Chaitanya
So how is that W series and other locked chipsets are also killing these CPUs
Posted on Reply
#3
natr0n
intel is finished
Posted on Reply
#4
Nanochip
ChaitanyaSo how is that W series and other locked chipsets also killing these CPUs
voltages too high.
Posted on Reply
#5
b1k3rdude
If its a hardware issue like GN have suggested to by a failure analysis lab, then how exactly is a microcode update going to fix it? seem like a non-sequitur to me.
Posted on Reply
#6
JustBenching
So amd overvolted their 3d chips in order to compete turning them into handgrenades and then Intel overvolted theirs resulting in crashes. Lovely market, competition is great
Posted on Reply
#7
Fungi
Why is this being announced on their forums? Is it that small an issue to them?
Posted on Reply
#8
Hecate91
Nanochipvoltages too high.
Voltage alone shouldn't be killing these cpu's, not when they're being run at 125w baseline specs in a server environment.
b1k3rdudeIf its a hardware issue like GN have suggested to by a failure analysis lab, then how exactly is a microcode update going to fix it? seem like a non-sequitur to me.
Intel admitted there was oxidization in 2023 with early raptor lake but didn't bring it up then, it seems like Intel wants to postpone the issue, if intel wants to delay cpu's failing until they are out of warranty then that is even more concerning.
IMO the way Intel handled this could've went much better, instead of blaming motherboard makers and waiting months to say what the real issue is, I'm looking forward to 3rd party failure analysis from GN.
Posted on Reply
#9
JustBenching
Hecate91Voltage alone shouldn't be killing these cpu's, not when they're being run at 125w baseline specs in a server environment.
Why not? Of course voltage alone can kill the chip even at super low power draw. Ive actually tested it.
Posted on Reply
#10
bug
Hecate91Voltage alone shouldn't be killing these cpu's, not when they're being run at 125w baseline specs in a server environment.
Excessive voltage alone can most certainly kill transistors. I mean, what else would it take? Aliens?
Posted on Reply
#11
the54thvoid
Super Intoxicated Moderator
fevgatosSo amd overvolted their 3d chips in order to compete turning them into handgrenades and then Intel overvolted theirs resulting in crashes. Lovely market, competition is great
In the race to performance, it seems nobody's innocent of cutting corners. Who'd a thunk it?
Posted on Reply
#12
JWNoctis
The last time some chip common in PC died prematurely and en-masse...Was that the Deathstar? To more or less the same direct cause of silicon degradation too.

I'm somehow under the impression that the outrage was louder back then.
Posted on Reply
#13
Hecate91
Indeed both sides have cut corners to win benchmarks, but Intel have been doing it much longer, they've finally been caught pushing things too hard though.
Posted on Reply
#14
Assimilator
Intel is desperately trying to make this go away but it's not going to, and the more they try to bury it the worse it's gonna be. I wouldn't be surprised if the FCC gets involved now that GN has publicised this problem. Even if they don't, you can bet there will be a class-action lawsuit.
fevgatosSo amd overvolted their 3d chips in order to compete turning them into handgrenades and then Intel overvolted theirs resulting in crashes. Lovely market, competition is great
This, exactly this. "Let's clock our CPUs to the bleeding edge to squeeze out a few more points in a benchmark because we are intellectually bankrupt and incapable of building an actually good product" is by far the most stupid race to the bottom that I've yet seen in my 25+ years of PC hardware experience, and I've seen a lot of stupidity. I'm glad it backfired on AMD and I'm glad it's backfiring on Intel, because it seems that getting hit in the wallet is the only way these companies will learn to not do stupid shit like this. BUILD. BETTER. PRODUCTS.
Posted on Reply
#15
close
The question is, will the CPU retain the performance characteristics once the voltage is dropped? or was it just enough to keep the CPUs in the top of the benchmarks for long enough to compete with current/next gen CPUs and the quietly drop the voltage and the performance when reviewers stop looking and retesting?
Posted on Reply
#16
Hecate91
bugExcessive voltage alone can most certainly kill transistors. I mean, what else would it take? Aliens?
If it is due to excessive voltage, then it must be a reason why Intel delayed moving from 10nm for so long. I'd like to see reviewers test this microcode if it is a real fix or Intel trying to avoid a recall, but since it took a year and a half for failures to show up it would have to be a long term test.
A lot of ifs here but I personally have very little trust for Intel after they've been shifting the blame for too long.
Posted on Reply
#17
JustBenching
AssimilatorIntel is desperately trying to make this go away but it's not going to, and the more they try to bury it the worse it's gonna be. I wouldn't be surprised if the FCC gets involved now that GN has publicised this problem. Even if they don't, you can bet there will be a class-action lawsuit.


This, exactly this. "Let's clock our CPUs to the bleeding edge to squeeze out a few more points in a benchmark because we are intellectually bankrupt and incapable of building an actually good product" is by far the most stupid race to the bottom that I've yet seen in my 25+ years of PC hardware experience, and I've seen a lot of stupidity. I'm glad it backfired on AMD and I'm glad it's backfiring on Intel, because it seems that getting hit in the wallet is the only way these companies will learn to not do stupid shit like this. BUILD. BETTER. PRODUCTS.
The product is fine. Great I'd argue. 13th gen is a massive uptick over 12th gen in Mt performance at same power, within a year. That's huge. The problem is they pushed for that last 0.8% of performance that doesn't make any difference.
Posted on Reply
#18
JWNoctis
closeThe question is, will the CPU retain the performance characteristics once the voltage is dropped? or was it just enough to keep the CPUs in the top of the benchmarks for long enough to compete with current/next gen CPUs and the quietly drop the voltage and the performance when reviewers stop looking and retesting?
Probably not, or they wouldn't have done it in the first place.

"Quietly drop the voltage and performance when reviewers stop looking" would be actual fraud.
Posted on Reply
#19
JustBenching
bugExcessive voltage alone can most certainly kill transistors. I mean, what else would it take? Aliens?
I was under the impression that voltage is a lesser evil compared to amperage but after putting my chip through some tests, voltage alone kills chips even at very low amperages
Posted on Reply
#20
user556
Performance, after the patch fix, shouldn't be negatively affected at all. Those CPUs will still run hot afterwards - if you choose that.

What will be the case, however, is damaged CPUs. It is an over-voltage well beyond any overclocking needs. The excessive voltage will have generated extreme hot spots that cooked small parts of the die. Intel is up for mailing out a lot of free replacements. But that will be after the patch is deployed.
Posted on Reply
#21
Tomorrow
ChaitanyaSo how is that W series and other locked chipsets also killing these CPUs
Even T series chips are affected. These are locked, low power variants of their bigger brothers.
Posted on Reply
#22
kiddagoat
fevgatosI was under the impression that voltage is a lesser evil compared to amperage but after putting my chip through some tests, voltage alone kills chips even at very low amperages
They are all related.... Ohm's Law. As you increase voltage and resistance stays the same, amperage (current) goes up. Too much voltage will definitely kill chips and produce more heat in the process. Ahhh the smell of a burning chip and the magic smoke that follows upon death.
Posted on Reply
#23
bug
Hecate91If it is due to excessive voltage, then it must be a reason why Intel delayed moving from 10nm for so long. I'd like to see reviewers test this microcode if it is a real fix or Intel trying to avoid a recall, but since it took a year and a half for failures to show up it would have to be a long term test.
Obviously it doesn't affect all CPUs. It took as long as it did because the problem is unknown. If Intel publishes enough data so people can replicate it, verifying the fix could be done much sooner than another 18 months.

A sad day, for sure. But really not surprising at all. I mean, we've had CPU firmware since the P4 FDIV bug, precisely to address these problems without having to replace the CPUs. And CPUs were much simpler back in P4 days. And yes, we've had numerous problems since, only they are usually more obvious and a firmware update can make them go away reliably. And sometimes they're worse.
Posted on Reply
#24
Tomorrow
JWNoctisThe last time some chip common in PC died prematurely and en-masse...Was that the Deathstar? To more or less the same direct cause of silicon degradation too.

I'm somehow under the impression that the outrage was louder back then.
Nvidia Bumpgate ~2008. Permanently ruined their relationship with Apple. Nvidia blamed TSMC for it.
Posted on Reply
#25
bug
fevgatosI was under the impression that voltage is a lesser evil compared to amperage but after putting my chip through some tests, voltage alone kills chips even at very low amperages
It's a bit more complicated, but typically high current is still caused by improper voltage: resources.pcb.cadence.com/blog/common-diode-failure-modes-in-circuits
(Yes, that's about diodes, but transistors are basically two diodes back to back.)
Posted on Reply
Add your own comment
Dec 24th, 2024 00:50 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts