Tuesday, July 23rd 2024
![Intel](https://tpucdn.com/images/news/intel-v1721205152158.png)
Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon
Long-term reliability issues continue to plague Intel's 13th Gen and 14th Gen Core desktop processors based on the "Raptor Lake" microarchitecture, with users complaining that their processors have become unstable with heavy processing workloads, such as games. This includes the chips that have minor levels of performance tuning or overclocking. Intel had earlier isolated many of these stability issues to faulty CPU core frequency boosting algorithms, which it addressed through updates to the processor microcode that it got motherboard- and prebuilt manufacturers to distribute as UEFI firmware updates. The company has now come out with new findings of what could be causing these issues.
In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.
Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
Sources:
Intel Community, Intel (Reddit)
In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.
Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.
Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance, the company stated.It's important to note here, that the microcode update won't fix the issues on processors already experiencing instability, but prevent it on chips that aren't. The instability is caused by irreversible physical degradation of the chip. These chips will, of course, be covered under warranty.
Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.
We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue, the company stated.If you feel your chip might be affected, you can file for an RMA.
215 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon
1) The problem would be more definitive in terms of the impact and failure rate
2) There would be less likely to be scenarios where people have defective CPUs but have no idea it's actually got a material defect which may only get worse and not actually cause a practical failure until a few years down the line, all the while causing other system issues... don't forget these will be device where people will be doing work which may not be valuable to you but will be to that person.
And point 2 is what most people are more concerned about.
Im sorry if my "history lesson" was seen as derailing the thread. I have nothing more to add on that matter. Not quite. Arrow Lake has several tiles. The tile that houses the CPU cores is still made in Intel's own fabs. The iGPU tile is made by TSMC. I dont remember the other tiles because there are like four or five of them. That is why we need people to bring up previous instances of similar things happening. Neutrally, without whataboutism and fanboyism.
Every company has and will screw something up at some point.
What's important is that we wont forget and it's important how the company handles it failure. Will it offer a quick fix with a promise of free replacement right away or will it spends months shifting the blame and being quiet about the issue?
Personally i just could not justify buying Intel before precisely because of this issue - Yes i get 100% performance today, but what about years down the line? Back then my fear was another Spectre/Meltdown fix that nerfed performance. Now we can add degradation to the list. I went with AM4 also for socket longevity and lower power consumption because electricity is expensive here. My point exactly. Why are we learning about this issue now trough Intel confirmation? Why were those CPU's not recalled? Indeed. One is bad enough but three separate issues within one socket?
LGA1700 will go down as one of Intel's buggiest/cursed/worst generations if this is true. All reasonable questions any affected customer should ask (demand?). Thus far Intel has not clearly communicated how they plan on addressing these. All i see are some vague promises for a "fix" in a month (waiting yet again), to contact their custom service (RMA yes/no?) and blaming laptop makers essentially.
Intel seems to have narrowed it down to three or four issues. I have my doubts. It's affecting nearly all laptop and desktop chips in the 13th or 14th gen (apart from those steppings derived from 12th gen).
Basically, who knows. Intel are not letting on which makes this even shadier. To not even disclose which batches or datecodes for the initial problem is a massive red flag to me.
Last time Intel had to recall CPUs in public domain in same way was the Pentium FPU bug I mentioned earlier.... they handled that quite badly actually (initially they knew but didn't mention it until public knowledge forced them to acknowledge the errata, then you could only get a replacement if you could prove you were impacted by it* until eventually pressure forced them to offer replacement to all), although to be fair back in the day this was a rarer event and I don't think many companies were quite geared up for the fall out not providing worthwhile RMA warranty support would bring.
* Kinda ridiculous as you have no way of knowing if some soon to be released software might trigger the issue repeatedly after the warranty has lapsed...
:)
2. Oxidation. New info (to general public) and according to Intel only affected batches of 13th and 14th gen models. Failure analysis pending.
3. Laptop issue. Unknown if due to degradation or oxidation or both. Newest info and so far not much to go on. Depends. One good example is Arctic. They proactively reached out and informed everyone of a flaw in their Arctic Freezer II line regarding the pump gasket degradation in the mentioned AIO's. Technically they did not recall the affected units tho.
They offered either free DIY kit (so users could quickly fix this themselves or at least have tools in hand) or an RMA if the user was uncomfortable performing the swap themselves.
I had and still have one of those AIO's that matched the bad batch number. I got my free replacement gasket and fill liquid trough their RMA and performed the swap as a preventative measure. Thankfully it had not yet started to degrade.
Arctic also extended the warranty period of those AIO's. Despite all this i feel comfortable buying and recommending their products because they noticed this first and provocatively reached out instead of months long drama, blame game and no clear RMA.
So yeah, while Intel should preemptively recall all 13th and 14th gen CPUs affected by this microcode bug, they won't. And thus a lot of people owning Intel CPUs are gonna be pretty upset a few days/weeks/months/years down the line, when their 13/14 gen CPU fails and it's wayyy out of warranty. What that does to Intel's reputation in the long run, we'll have to see.
Downclocking fixes it for almost everyone, which also makes sense - less clocks less voltage.
I bought their Linkbuds S wireless earbuds in 2022 only for these to develop a battery discharge issue a week after my two year warranty period ended (both go from 100% to empty withing 15-30 minutes instead of usual ~8 hours).
Reading reviews and comments online there are many people facing the same issue with both the Linkbuds S and WF-1000 XM4 models produced and bought in 2022.
Yet Sony has not even acknowledged the issue nor provided any replacements for customers because in their eyes the warranty period for both products (for 2022 buyers at least) has ended and thus they feel they dont have to do anything.
A product failing a week after warranty ended feels like planned obsolescence...
I’m not sure what a recall would even look like. What do they replace them with? Instead they’ll hope to ride it out through warranty service which informally does the same thing without some uncomfortable stories about failed chips being recalled. Even a post-mortem class action is the lesser of two evils, since those take years to play out. By then they can have something else out and show how they’ve reformed.
Combine that with - "Hey let's build a server with these" and you get ... "WHY ARE ALL THESE SERVERS CRASHING AFTER A FEW MONTHS!?!?"