Tuesday, July 23rd 2024

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Updated by

Jul 23rd, 2024 02:04 Updated: Jul 23rd, 2024 07:24 Discuss (387 Comments)

Long-term reliability issues continue to plague Intel's 13th Gen and 14th Gen Core desktop processors based on the "Raptor Lake" microarchitecture, with users complaining that their processors have become unstable with heavy processing workloads, such as games. This includes the chips that have minor levels of performance tuning or overclocking. Intel had earlier isolated many of these stability issues to faulty CPU core frequency boosting algorithms, which it addressed through updates to the processor microcode that it got motherboard- and prebuilt manufacturers to distribute as UEFI firmware updates. The company has now come out with new findings of what could be causing these issues.

In a statement Intel posted on its website on Monday (22/07), the company said that it has been investigating the processors returned to it by users under warranty claims (which it has been replacing under the terms of its warranty). It has found that faulty processor microcode has been causing the processors to operate under excessive core voltages, leading to their structural degradation over time. "We have determined that elevated operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of returned processors confirms that the elevated operating voltage is stemming from a microcode algorithm resulting in incorrect voltage requests to the processor."

Modern processor power management runs on an intricate clockwork of collaboration between software, firmware, and hardware, with the software constantly telling the hardware what levels of performance it wants, and the hardware managing its power- and thermal budgets by rapidly altering the power and clock speeds of the various components, such as CPU cores, caches, fabric, and other on-die components. A faulty collaboration between any of the three key components could break this clockwork, as has happened in this case.

Intel is releasing yet another microcode update to its 13th- and 14th Gen Core processors, which will address not just the faulty boosting algorithm issue the company unearthed in June, but also the faulty voltage management the company discovered now. This new microcode should be released some time around mid-August to partners (motherboard manufacturers and PC OEMs), who will then need to validate it on their machines, before passing it along to end-users as UEFI firmware updates.

Intel is delivering a microcode patch which addresses the root cause of exposure to elevated voltages. We are continuing validation to ensure that scenarios of instability reported to Intel regarding its Core 13th/14th Gen desktop processors are addressed. Intel is currently targeting mid-August for patch release to partners following full validation. Intel is committed to making this right with our customers, and we continue asking any customers currently experiencing instability issues on their Intel Core 13th/14th Gen desktop processors reach out to Intel Customer Support for further assistance, the company stated.

It's important to note here, that the microcode update won't fix the issues on processors already experiencing instability, but prevent it on chips that aren't. The instability is caused by irreversible physical degradation of the chip. These chips will, of course, be covered under warranty.

Meanwhile, an interesting issue has come to light, which that some of Intel's processors built on the Intel 7 node are experiencing chemical oxidation of the die as they age. Intel responded to this, stating that it had discovered the oxidation manufacturing issues in 2023, and addressed it. The company also stated that die oxidation is not related to the stability issues it is embattled with.

We can confirm that the via Oxidation manufacturing issue affected some early Intel Core 13th Gen desktop processors. However, the issue was root caused and addressed with manufacturing improvements and screens in 2023. We have also looked at it from the instability reports on Intel Core 13th Gen desktop processors and the analysis to-date has determined that only a small number of instability reports can be connected to the manufacturing issue, the company stated.

If you feel your chip might be affected, you can file for an RMA.

Sources: Intel Community, Intel (Reddit)

Add your own comment

387 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

#251

thesmokingman

TumbleGeorge"Modern farm" unreal engine division:. 50% failed Intel 13/14 gen. Wow.

Soon all the devs are gonna be moving on to redder pastures. Good job Intel.

For those curious at work our failure rate for our 13900k and 14900k machines is about 50% so far, any new machine builds going to be 9950x's, production environments need reliability

#252

OneMoar

There is Always Moar

nice try on the explain-it-away there Intel
But nobody is that stupid you pushed your silicon beyond its reasonable limits to compete with amd and it came back to bite you
now you have sold a bunch of cpus to people that are suddenly going to get a lot slower

I hope you enjoy class actions because there is one headed your way

#253

Tomorrow

Vincero5800X didn't really offer a lot over the 5800 (less than 2% difference in single-threaded performance, and nearly 8% in multi-threaded) but with a 60% higher TDP... and the same was true further down the product stack.

5800X: Nov 5th, 2020, Retail
5800: Jan 12th, 2021, OEM

5800 was released several months later as 65W OEM exclusive model.

#254

phanbuey

OneMoarnice try on the explain-it-away there Intel
But nobody is that stupid you pushed your silicon beyond its reasonable limits to compete with amd and it came back to bite you
now you have sold a bunch of cpus to people that are suddenly going to get a lot slower

I hope you enjoy class actions because there is one headed your way

Im surprised this isn't already happening.

#255

trparky

OneMoarBut nobody is that stupid you pushed your silicon beyond its reasonable limits to compete with amd and it came back to bite you

That's how I feel as well. Intel is far beyond the point where they should have gone back to the drawing board to design a whole new microarchitecture from the ground up. If AMD can do it, why can't Intel? Seriously, Intel makes far more money than AMD makes.

#256

Nhonho

Intel CPU users: just buy a new PC with a 15th generation Intel CPU and the problem will be solved.

When problems arise with 15th generation Intel CPUs, just buy a new PC with a 16th or 17th generation Intel CPU and the problem will be solved.

Do it exactly this way and you won't have any problems.

#257

ratirt

mkppoGN just posted results from the analysis lab and denied RMA's. Edit: They're still awaiting results from the lab, but changes nothing.

So uh..sell dodgy chips to server farms, deny RMA even though they knew about the oxidization issue at the time then release a half assed statement two years after the chips launch that the chips have an issue, sorry wait multiple issues.

So if intel apparently 'fixed' this oxidization issue which apparently plagued early 13th gen batches, obviously they knew about it. And then they did....nothing? For years? They release a statement saying that right when third party analysts start mentioning it? Also, they conveniently fail to mention what batches were affected by the issue. Still trying to figure that out after years eh?

Something smells funny.

edit2: Just putting this out there as well, he rambles for a whole hour but the first two minutes are pretty informative lol

GN didn't leave anything for Intel to hold onto i suppose. Fed up so much it is hard to believe.
The worst part is, Intel knew about it and still went with it. I only feel sorry for those who have not received RMA's of the product even though they deserved it. Not informing partners about the issue is literally speaking disgusting.
I'm gonna go on a limb and say "to be like Intel" meaning very dishonest. Not a shocker either.

#258

Zubasa

ratirtGN did leave anything for Intel to hold onto i suppose. Fed up so much it is hard to believe.
The worst part is, Intel knew about it and still went with it. I only feel sorry for those who have not received RMA's of the product even though they deserved it. Not informing partners about the issue is literally speaking disgusting.
I'm gonna go on a limb and say "to be like Intel" meaning very dishonest. Not a shocker either.

Worse, some users actually recieved RMA units that are also faulty and died/became unstable promptly.
I am not even sure if Intel truly knows which units are safe.

#259

ratirt

ZubasaWorse, some users actually recieved RMA units that are also faulty and died/became unstable promptly.
I am not even sure if Intel truly knows which units are safe.

Oh boy. That is below the belt to be honest.
I'm sure they know but that's beside the point because , maybe all of them are bad and it's just hard to admit it. Sometimes it is better to play dumb and look for a solution to the mess they have made. The microcode reasoning is just ridiculous here. They just beat around the bush to buy more time. Or, i really don't know what they are doing to be honest. i also don't care. Lost interest in Intel long time ago.

#260

trparky

ratirtLost interest in Intel long time ago.

Me too, at least... ever since Ryzen.

#261

Jism

trparkyMe too, at least... ever since Ryzen.

There's some weird choices that intel makes on their high end offerings.

Temperature limit of 110 degree hotspot or core.
Boosting beyond 6Ghz at the toes of the silicon,
Nics, NUCS (Atom) failing - same degradation happening
Power targets of well over 244W to even 350W for the higher end models.

AMD on the other hand has a certain trust in the products they bring out and PBO for example is going to last you years.

I still have a 2700X at this point, PBO for years and slightly undervolted. No issues at all.

#262

LittleBro

Seriously ... what y'all have been expecting?
So, Intel ... eating insane amounts of Watts (300+) in benchmarks just to break every record and to dominate over AMD by a few digits of % ... Was it all worth it? Now you have the answer.

After years of Intel product paper releases and product launch postponing they said the'll accelerate the arrival of upcoming generations. They said they'll deliver on time, maybe even 2 generations per year. Now you have it. Lack of time for proper product development and testing caused all this desperate voltage-hungry benchmark-record-breaking unreliable CPUs to exist at first place. It's about goddamn time Intel realized that the proper way of progress (in terms of increasing IPC) is to modify (improve) the architecture. But you can't do that without enough time, right? Touching architecture was the AMD's approach with Zen, Zen 2, Zen 3 and Zen 5 (Zen 4 excluding). [Many don't see it but the Zen 5 is not a minor architectural improvement over Zen 4.]

This headless approach to drop HT so that they push the core clocks (& voltages of course) even higher than before was pointless at first place. Now they're planning on dropping the e-cores after they had made claims that the current e-core has IPC comparable to Raptor Lake P-Core? Why the hell would you want to drop such a good core? Perhaps to free up the resources for another round of brute force freq/voltage pushing round?

They invented the HT, they invented the e-cores for the desktop. What amount of money was put into the development of these ... They managed to win the battle with core scheduling problems and the battle with e-cores being much less powerful than P-cores. Now Arrow Lake will be stripped of HT, Bartlett Lake will be stripped of e-cores. This kind of mess seems like a trial & error approach to me.

Admitting the oxidation issues more than a year after the 13th Gen products were released (and not telling anything to anyone) really needs to see a court. They need to be fined an amount that is of a considerable loss to them - like 10% worth of their year's revenue or so.

#263

mkppo

phanbueyThey put desktop CPUs in older blades that weren't designed for those chips -- it was a custom hack, with intel's datacenter team but a custom hack nonetheless. You're making it sound like this is a common thing -- it's not. We build our racks and servers as well, and "Blades of 14900Ks with 128gbs of ECC ram" are not a normal setup. Server chips are usually xeons or pentiums and they run at 2.1-3.2 ghz max - and they sit there for 20 years doing it.

Of course these are not designed to burn out in a few months - but if you have blades burning out your 14900Ks ... why... put... more... 14900Ks... in those blades? Not saying that the chip is good, but when you're putting a yolked 14900K into a blade to save money this is kind of exactly the downside.

I'm aware that it's not common, it probably represents 1% of total servers. But among game hosting servers it's not as rare to put desktop CPU's/sockets because they want good single threaded performance, not a lot of threads and don't have the need to run the systems for 20 years or even 10. These systems also have a much smaller blast radius. These guys just want the CPU to be able to do it's job till the next upgrade cycle at which point they just upgrade the CPU's, all for a fraction of the cost of xeon/epyc.

It's precisely why AMD released their EPYC 4004 on AM5.

Also, it's not like they just kept putting more 14900K CPU's in new racks. Some devs initially built racks with 13900K/14900K CPU's but many have failed twice and second time around was with an underclock because they thought that stock clocks might be the issue for the first batch of CPU's to degrade. Makes sense to do that because they're getting the CPU's through RMA so it's 'free' other than the downtime and lost time through debugging. It's finally come to the point now that they're just replacing all the racks with AMD because, well, enough is enough

Edit: autocorrect on mobile sucks sorry

#264

ratirt

JismThere's some weird choices that intel makes on their high end offerings.

Temperature limit of 110 degree hotspot or core.
Boosting beyond 6Ghz at the toes of the silicon,
Nics, NUCS (Atom) failing - same degradation happening
Power targets of well over 244W to even 350W for the higher end models.

AMD on the other hand has a certain trust in the products they bring out and PBO for example is going to last you years.

I still have a 2700X at this point, PBO for years and slightly undervolted. No issues at all.

I had a 2700x. Jumped on the 5800x I currently use due to 6900xt purchase.
No problems with my set up and the old one, 2700x serves my brother till this day with a 5600xt.
There is too many unknowns with Intel at this point for me and my concerns extend with current way of things around the company.

#265

iameatingjam

Random thought... I wonder if 12th gen is going to start going up in the price if this new fix isn't all its cracked up to be. People will be looking for a way to have a working computer without having to swap out motherboards....

#266

Dorek

JismThere's some weird choices that intel makes on their high end offerings.

Temperature limit of 110 degree hotspot or core.
Boosting beyond 6Ghz at the toes of the silicon,
Nics, NUCS (Atom) failing - same degradation happening
Power targets of well over 244W to even 350W for the higher end models.

AMD on the other hand has a certain trust in the products they bring out and PBO for example is going to last you years.

I still have a 2700X at this point, PBO for years and slightly undervolted. No issues at all.

Also had 2700x but wasnt stable at all sadly with PBO.

#267

BoggledBeagle

I wonder if all the different theories about what is going on are not just superflous and the real culprit is simply the frequency itself. If the real safe frequency ensuring long term reliability for intensive 24/7 workloads on this manufacturing process is say 4,6 GHz, no wonder that things are breaking when you run the CPUs 1GHz quicker.

Every technology has its limits and the problematic Intel 10nm process even after all the improvements may not be able to handle high frequencies, due to its higher tendency to degradation at high temperatures (or high current density or both).

#268

R0H1T

There's no fixed frequency or exact upper limit till which processors can operate, it's a range hence the ~50% failure rate not 100% or so. As for the safe frequency(range) Intel probably knew it & yet gambled on the chips surviving this under heavy loads! I don't buy any theory which doesn't have the buck at Intel's feet & it's their decisions not just a bad batch of chips :shadedshu:

#269

MikeSnow

phanbueyOf course these are not designed to burn out in a few months - but if you have blades burning out your 14900Ks ... why... put... more... 14900Ks... in those blades? Not saying that the chip is good, but when you're putting a yolked 14900K into a blade to save money this is kind of exactly the downside.

It's not the blades that are burning the CPUs. The CPUs are burning themselves. And the main reason they are doing it is not to save money, but to get high single thread performance.

#270

R0H1T

Regardless of how much I call Intel names this is on the dumb people putting OCed chips, way past their limits, in servers! The reason why server chips have conservative clocks should be fairly obvious & why running desktop chips at 6Ghz @24*7 is a bad idea.

#271

Rabit

BoggledBeagleI wonder if all the different theories about what is going on are not just superflous and the real culprit is simply the frequency itself. If the real safe frequency ensuring long term reliability for intensive 24/7 workloads on this manufacturing process is say 4,6 GHz, no wonder that things are breaking when you run the CPUs 1GHz quicker.

Every technology has its limits and the problematic Intel 10nm process even after all the improvements may not be able to handle high frequencies, due to its higher tentedcy to degradation at high temperatures (or high current density or both).

First 12900K start dieing in game servers, this simply can be intel node used to this cpus have short lifespawn

#272

BoggledBeagle

R0H1TThere's no fixed frequency or exact upper limit till which processors can operate, it's a range hence the ~50% failure rate not 100% or so.

Yes, there is a particular fixed frequency as a reply to a question: What is the maximal frequency to run this load this many hours a day at this temperature, when you want only this percentace of CPUs to fail in this given period of time.

It seems that Intel threw any cautious and responsible behavior out of the window and they just cranked the frequency to the maximum that the CPUs will survive in the hands of the reviewers.

#273

MikeSnow

R0H1TRegardless of how much I call Intel names this is on the dumb people putting OCed chips, way past their limits, in servers! The reason why server chips have conservative clocks should be fairly obvious & why running desktop chips at 6Ghz @24*7 is a bad idea.

Why does Intel sell chips OCed way past their limits?

#274

R0H1T

Well I'm not Intel so I can't answer this on their behalf, although I also wouldn't mind them being sued for $100 billion for it. They got away pretty lightly for their 2004-06(?) OEM BS & ideally this time should be different.

#275

Sunny and 75

Intel need to learn and improve or there will be stagnation all over again.

Add your own comment

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

387 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Related News

387 Comments on Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts