Tuesday, July 23rd 2024

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Long-term reliability issues continue to plague Intel's 13th Gen and 14th Gen Core desktop processors based on the "Raptor Lake" microarchitecture, with users complaining that their processors have become unstable with heavy processing workloads, such as games. This includes the chips that have minor levels of performance tuning or overclocking. Intel had earlier isolated many of these stability issues to faulty CPU core frequency boosting algorithms, which it addressed through updates to the processor microcode that it got motherboard- and prebuilt manufacturers to distribute as UEFI firmware updates. The company has now come out with new findings of what could be causing these issues.

In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."
Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.

Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance, the company stated.
It's important to note here, that the microcode update won't fix the issues on processors already experiencing instability, but prevent it on chips that aren't. The instability is caused by irreversible physical degradation of the chip. These chips will, of course, be covered under warranty.

Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue, the company stated.
If you feel your chip might be affected, you can file for an RMA.
Sources: Intel Community, Intel (Reddit)
Add your own comment

215 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

#177
Upgrayedd
KlemcALL, it's microcode
That's crazy cause I haven't had a single crash
Posted on Reply
#178
Vincero
UpgrayeddHow many CPUs do you guys think are actually dying? The way you're talking you'd think it was 1 in 5.
The problem isn't so much them dying... if every faulty CPU was properly failing that would actually be better because:
1) The problem would be more definitive in terms of the impact and failure rate
2) There would be less likely to be scenarios where people have defective CPUs but have no idea it's actually got a material defect which may only get worse and not actually cause a practical failure until a few years down the line, all the while causing other system issues... don't forget these will be device where people will be doing work which may not be valuable to you but will be to that person.

And point 2 is what most people are more concerned about.
Posted on Reply
#179
Tomorrow
ZubasaI suspect that is the goal of some of the comments with "whataboutism".
Get the discussion derailed and hopefully shield Intel from further discussion/critisim.
fevgatos certainly with his warped view of Zen 3 support on first generation AM4 boards/chipsets. I wont even entertain him with a response here. I bought up Bumpgate because i realize not everyone here has lived trough this.
Im sorry if my "history lesson" was seen as derailing the thread. I have nothing more to add on that matter.
DavenBy the way, Arrow Lake is a telling product that Intel is changing its ways. The chip is being made wholly on TSMC processes which are far better. The chip is being clocked down and buggy parts such as HT are being stripped out.
Not quite. Arrow Lake has several tiles. The tile that houses the CPU cores is still made in Intel's own fabs. The iGPU tile is made by TSMC. I dont remember the other tiles because there are like four or five of them.
john_All will be forgiven/forgotten in 6-12 months from now.
That is why we need people to bring up previous instances of similar things happening. Neutrally, without whataboutism and fanboyism.
Every company has and will screw something up at some point.

What's important is that we wont forget and it's important how the company handles it failure. Will it offer a quick fix with a promise of free replacement right away or will it spends months shifting the blame and being quiet about the issue?

Personally i just could not justify buying Intel before precisely because of this issue - Yes i get 100% performance today, but what about years down the line? Back then my fear was another Spectre/Meltdown fix that nerfed performance. Now we can add degradation to the list. I went with AM4 also for socket longevity and lower power consumption because electricity is expensive here.
evernessinceSomething doesn't line up here. Intel claims that the oxidation was a separate issue that was root caused and fixed awhile back but no one, not even those in the enthusiast community were made aware. On top of that Raptor lake was released all the way back in October of 2022. It's 2024 now
My point exactly. Why are we learning about this issue now trough Intel confirmation? Why were those CPU's not recalled?
evernessinceSo at the very minimum, according to Intel's own words, there are at least 3 issues running amok right now.
Indeed. One is bad enough but three separate issues within one socket?
LGA1700 will go down as one of Intel's buggiest/cursed/worst generations if this is true.
evernessinceHow will Intel remedy customers for the potential damaged caused by the all 3 claimed issues?
Will there be an outreach campaign to notify all customers potentially affected?
Given the permanent damage these issues cause, who qualifies for a replacement / refund? (as in everyone or just those who can demonstrate they are having issues)
What are the product level implication for the fix to these issues (where applicable)? Specifically, how will it impact performance and processor behavior?
What is Intel's proposed solution to it's crashing issue on laptops given Intel has itself claimed that laptops are crashing due to other reasons at an elevated rate? (Intel's press release made it appear to me they were pointing fingers elsewhere again with this one)
All reasonable questions any affected customer should ask (demand?). Thus far Intel has not clearly communicated how they plan on addressing these. All i see are some vague promises for a "fix" in a month (waiting yet again), to contact their custom service (RMA yes/no?) and blaming laptop makers essentially.
Posted on Reply
#180
Onyx Turbine
Tomorrowfevgatos certainly with his warped view of Zen 3 support on first generation AM4 boards/chipsets. I wont even entertain him with a response here. I bought up Bumpgate because i realize not everyone here has lived trough this.
Im sorry if my "history lesson" was seen as derailing the thread. I have nothing more to add on that matter.

Not quite. Arrow Lake has several tiles. The tile that houses the CPU cores is still made in Intel's own fabs. The iGPU tile is made by TSMC. I dont remember the other tiles because there are like four or five of them.

That is why we need people to bring up previous instances of similar things happening. Neutrally, without whataboutism and fanboyism.
Every company has and will screw something up at some point.

What's important is that we wont forget and it's important how the company handles it failure. Will it offer a quick fix with a promise of free replacement right away or will it spends months shifting the blame and being quiet about the issue?

Personally i just could not justify buying Intel before precisely because of this issue - Yes i get 100% performance today, but what about years down the line? Back then my fear was another Spectre/Meltdown fix that nerfed performance. Now we can add degradation to the list. I went with AM4 also for socket longevity and lower power consumption because electricity is expensive here.

My point exactly. Why are we learning about this issue now trough Intel confirmation? Why were those CPU's not recalled?

Indeed. One is bad enough but three separate issues within one socket?
LGA1700 will go down as one of Intel's buggiest/cursed/worst generations if this is true.

All reasonable questions any affected customer should ask (demand?). Thus far Intel has not clearly communicated how they plan on addressing these. All i see are some vague promises for a "fix" in a month (waiting yet again), to contact their custom service (RMA yes/no?) and blaming laptop makers essentially.
To keep an overview what are the three issues now with 1700, i have: oxidation risks/ high certainity, microcode upper range cpus 1314 and third is?
Posted on Reply
#181
RGAFL
UpgrayeddHow many CPUs do you guys think are actually dying? The way you're talking you'd think it was 1 in 5.
It's strange because there seems to be a combination of factors that drive the percentage up but in isolation seems to keep it lower. I think the factory degradation issue is the main worry, how many out there suffer from it, what batches are they, Intel have not disclosed any of that (surely they must know). If it is fixed as Intel claim then the worry becomes that there is a new degradation issue of the voltages causing it (Alderon Games tested a new batch of CPUS that Intel claimed free of the factory degradation issue). These also showed a high degree of crashes.

Intel seems to have narrowed it down to three or four issues. I have my doubts. It's affecting nearly all laptop and desktop chips in the 13th or 14th gen (apart from those steppings derived from 12th gen).

Basically, who knows. Intel are not letting on which makes this even shadier. To not even disclose which batches or datecodes for the initial problem is a massive red flag to me.
Posted on Reply
#182
Vincero
TomorrowMy point exactly. Why are we learning about this issue now trough Intel confirmation? Why were those CPU's not recalled?
No company wants to recall if they can avoid it - no doubt there is a legal / numbers game that decides on certain things.
Last time Intel had to recall CPUs in public domain in same way was the Pentium FPU bug I mentioned earlier.... they handled that quite badly actually (initially they knew but didn't mention it until public knowledge forced them to acknowledge the errata, then you could only get a replacement if you could prove you were impacted by it* until eventually pressure forced them to offer replacement to all), although to be fair back in the day this was a rarer event and I don't think many companies were quite geared up for the fall out not providing worthwhile RMA warranty support would bring.

* Kinda ridiculous as you have no way of knowing if some soon to be released software might trigger the issue repeatedly after the warranty has lapsed...
Posted on Reply
#184
RGAFL
Onyx TurbineTo keep an overview what are the three issues now with 1700, i have: oxidation risks/ high certainity, microcode upper range cpus 1314 and third is?
The third one seems to be the TVB voltage stays locked at the higher voltage even when the processor has dropped to a lower power level.
Posted on Reply
#185
Tomorrow
Onyx TurbineTo keep an overview what are the three issues now with 1700, i have: oxidation risks/ high certainity, microcode upper range cpus 1314 and third is?
1. Degradation. Affects desktop i5, i7 and i9. 13th and 14th gen models including locked and "power efficient" T series chips.
2. Oxidation. New info (to general public) and according to Intel only affected batches of 13th and 14th gen models. Failure analysis pending.
3. Laptop issue. Unknown if due to degradation or oxidation or both. Newest info and so far not much to go on.
VinceroNo company wants to recall if they can avoid it
Depends. One good example is Arctic. They proactively reached out and informed everyone of a flaw in their Arctic Freezer II line regarding the pump gasket degradation in the mentioned AIO's. Technically they did not recall the affected units tho.

They offered either free DIY kit (so users could quickly fix this themselves or at least have tools in hand) or an RMA if the user was uncomfortable performing the swap themselves.

I had and still have one of those AIO's that matched the bad batch number. I got my free replacement gasket and fill liquid trough their RMA and performed the swap as a preventative measure. Thankfully it had not yet started to degrade.

Arctic also extended the warranty period of those AIO's. Despite all this i feel comfortable buying and recommending their products because they noticed this first and provocatively reached out instead of months long drama, blame game and no clear RMA.
Posted on Reply
#186
Assimilator
VinceroNo company wants to recall if they can avoid it - no doubt there is a legal / numbers game that decides on certain things.
Last time Intel had to recall CPUs in public domain in same way was the Pentium FPU bug I mentioned earlier.... they handled that quite badly actually (initially they knew but didn't mention it until public knowledge forced them to acknowledge the errata, then you could only get a replacement if you could prove you were impacted by it* until eventually pressure forced them to offer replacement to all), although to be fair back in the day this was a rarer event and I don't think many companies were quite geared up for the fall out not providing worthwhile RMA warranty support would bring.

* Kinda ridiculous as you have no way of knowing if some soon to be released software might trigger the issue repeatedly after the warranty has lapsed...
A product recall in silicon is extremely rare, primarily because of the amount of effort put into verifying the product such that you don't need to do a recall. Because recalls are horribly expensive and in this case, the scope is so large that Intel likely doesn't have the stock to cover it.

So yeah, while Intel should preemptively recall all 13th and 14th gen CPUs affected by this microcode bug, they won't. And thus a lot of people owning Intel CPUs are gonna be pretty upset a few days/weeks/months/years down the line, when their 13/14 gen CPU fails and it's wayyy out of warranty. What that does to Intel's reputation in the long run, we'll have to see.
Posted on Reply
#187
Klemc
AssimilatorA product recall in silicon is extremely rare, primarily because of the amount of effort put into verifying the product such that you don't need to do a recall. Because recalls are horribly expensive and in this case, the scope is so large that Intel likely doesn't have the stock to cover it.

So yeah, while Intel should preemptively recall all 13th and 14th gen CPUs affected by this microcode bug, they won't. And thus a lot of people owning Intel CPUs are gonna be pretty upset a few days/weeks/months/years down the line, when their 13/14 gen CPU fails and it's wayyy out of warranty. What that does to Intel's reputation in the long run, we'll have to see.
It will be a lot of students, so no problemo for Intel, parents will buy another to have the student working ASAP.
Posted on Reply
#188
phanbuey
UpgrayeddThat's crazy cause I haven't had a single crash
I think it's a combination of microcode and motherboard overvolting - like the microcode will have the VID table request 1.45v and the mobo will feed 1.53 for load line or whatever. So 100% of cpus failing for one developer - that makes sense, they're putting them back in the same board/servers that behaves the same way. Where as in your setup that might not be an issue.

Downclocking fixes it for almost everyone, which also makes sense - less clocks less voltage.
Posted on Reply
#189
Daven
TomorrowNot quite. Arrow Lake has several tiles. The tile that houses the CPU cores is still made in Intel's own fabs. The iGPU tile is made by TSMC. I dont remember the other tiles because there are like four or five of them.
There does seem to be some question regarding the manufacturing of Arrow Lake. Some rumors point to TSMC for all Core Ultra 9 and 7 tiles and only Core Ultra 5 compute tiles use Intel 20A. Lunar Lake has moved entirely to TSMC which surprised some people including me. Most rumors still point to Intel 20A for the Arrow Lake compute tile. If the problems associated with 13th and 14th gen stability including oxidation issues are bigger than we know, Arrow Lake might get moved entirely to TSMC. Time will tell.
Posted on Reply
#190
trparky
AssimilatorSo yeah, while Intel should preemptively recall all 13th and 14th gen CPUs affected by this microcode bug, they won't. And thus a lot of people owning Intel CPUs are gonna be pretty upset a few days/weeks/months/years down the line, when their 13/14 gen CPU fails and it's wayyy out of warranty. What that does to Intel's reputation in the long run, we'll have to see.
Considering that most big box stores be it Best Buy or Microcenter run Back-To-School deals every year, they'll just get them a new computer and the old one will be chucked into the trash.
Posted on Reply
#191
Tomorrow
TomorrowDepends. One good example is Arctic. They proactively reached out and informed everyone of a flaw in their Arctic Freezer II line regarding the pump gasket degradation in the mentioned AIO's. Technically they did not recall the affected units tho.

They offered either free DIY kit (so users could quickly fix this themselves or at least have tools in hand) or an RMA if the user was uncomfortable performing the swap themselves.

I had and still have one of those AIO's that matched the bad batch number. I got my free replacement gasket and fill liquid trough their RMA and performed the swap as a preventative measure. Thankfully it had not yet started to degrade.

Arctic also extended the warranty period of those AIO's. Despite all this i feel comfortable buying and recommending their products because they noticed this first and provocatively reached out instead of months long drama, blame game and no clear RMA.
I can also offer a worse example: Sony.
I bought their Linkbuds S wireless earbuds in 2022 only for these to develop a battery discharge issue a week after my two year warranty period ended (both go from 100% to empty withing 15-30 minutes instead of usual ~8 hours).
Reading reviews and comments online there are many people facing the same issue with both the Linkbuds S and WF-1000 XM4 models produced and bought in 2022.

Yet Sony has not even acknowledged the issue nor provided any replacements for customers because in their eyes the warranty period for both products (for 2022 buyers at least) has ended and thus they feel they dont have to do anything.

A product failing a week after warranty ended feels like planned obsolescence...
Posted on Reply
#192
Darmok N Jalad
I think at this point we have to wait for the microcode update and then see how things play out. If there is more to the story, we are not going to get it officially.

I’m not sure what a recall would even look like. What do they replace them with? Instead they’ll hope to ride it out through warranty service which informally does the same thing without some uncomfortable stories about failed chips being recalled. Even a post-mortem class action is the lesser of two evils, since those take years to play out. By then they can have something else out and show how they’ve reformed.
Posted on Reply
#193
Klemc
DavenThere does seem to be some question regarding the manufacturing of Arrow Lake. Some rumors point to TSMC for all Core Ultra 9 and 7 tiles and only Core Ultra 5 compute tiles use Intel 20A. Lunar Lake has moved entirely to TSMC which surprised some people including me. Most rumors still point to Intel 20A for the Arrow Lake compute tile. If the problems associated with 13th and 14th gen stability including oxidation issues are bigger than we know, Arrow Lake might get moved entirely to TSMC. Time will tell.
They move to TSMC for next gen, mmmh, it's totally tied to what happens to the 13-14 gen.
Posted on Reply
#194
Vincero
KlemcIt will be a lot of students, so no problemo for Intel, parents will buy another to have the student working ASAP.
That only works out if that custom comes back to you... Intel are more likely to push people to other makers... Apple might win out some of the Intel hold-outs who for some reason will not buy an AMD laptop... and potentially loose out permanently...
Posted on Reply
#195
Hxx
KlemcTotal Recall

:)
We need Arnie to uncover this conspiracy
Posted on Reply
#196
Klemc
Darmok N JaladI think at this point we have to wait for the microcode update and then see how things play out. If there is more to the story, we are not going to get it officially.

I’m not sure what a recall would even look like. What do they replace them with? Instead they’ll hope to ride it out through warranty service which informally does the same thing without some uncomfortable stories about failed chips being recalled. Even a post-mortem class action is the lesser of two evils, since those take years to play out. By then they can have something else out and show how they’ve reformed.
If they RMA all at once, it will be as much as CrowdStrike thingie.
Posted on Reply
#197
Vincero
HxxWe need Arnie to uncover this conspiracy
"Welcome to the party Gelsinger"
Posted on Reply
#198
Super Firm Tofu
phanbueyI think it's a combination of microcode and motherboard overvolting - like the microcode will have the VID table request 1.45v and the mobo will feed 1.53 for load line or whatever. So 100% of cpus failing for one developer - that makes sense, they're putting them back in the same board/servers that behaves the same way. Where as in your setup that might not be an issue.

Downclocking fixes it for almost everyone, which also makes sense - less clocks less voltage.
Or in the case of ridiculous CPUs (that should never have been released), the VID table will request 1.523v and then add the microcode error and mobo....*insert melting sounds*

Posted on Reply
#199
phanbuey
Super Firm TofuOr in the case of ridiculous CPUs (that should never have been released), the VID table will request 1.523v and then add the microcode error and mobo....*insert melting sounds*

Yeah I mean... is anyone really shocked here. I feel like "Hey bro, if you run your CPU at 6.2Ghz and 1.53-1.56v you will degrade it quickly" has been common knowledge for a while.

Combine that with - "Hey let's build a server with these" and you get ... "WHY ARE ALL THESE SERVERS CRASHING AFTER A FEW MONTHS!?!?"
Posted on Reply
#200
Lewzke
But how this microcode was tested? (It was tested like the famous Crowdstrike driver ...) I mean it's pretty basic task to test voltages and temperatures at the engineering phases, how the hell they not catched the faulty microcode?
Posted on Reply
Add your own comment
Jul 23rd, 2024 13:23 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts