Friday, June 14th 2024

Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

Intel has identified the root cause for stability issues being observed with certain high-end 13th- and 14th Gen Core "Raptor Lake" processor models, which were causing games and other compute-intensive applications to randomly crash. When the issues were first identified, Intel recommended a workaround that would reduce core-voltages and restrict the boost headroom of these processors, which would end up with reduced performance. The company has apparently discovered the root cause of the problem, as Igor's Lab learned from confidential documents.

The documents say that Intel isolated the problem to a faulty value in the microcode's end of the eTVB (enhanced thermal velocity boost) algorithm. "Root cause is an incorrect value in a microcode algorithm associated with the eTVB feature. Implication Increased frequency and corresponding voltage at high temperature may reduce processor reliability. Observed Found internally," the document says, mentioning "Raptor Lake-S" (13th Gen) and "Raptor Lake Refresh-S" (14th Gen) as the affected products.
The company goes on to elaborate on the issue in its Failure Analysis (FA) document:
Failure Analysis (FA) of 13th and 14th Generation K SKU processors indicates a shift in minimum operating voltage on affected processors resulting from cumulative exposure to elevated core voltages. Intel analysis has determined a confirmed contributing factor for this issue is elevated voltage input to the processor due to previous BIOS settings which allow the processor to operate at turbo frequencies and voltages even while the processor is at a high temperature. Previous generations of Intel K SKU processors were less sensitive to these type of settings due to lower default operating voltage and frequency.
Identifying the root cause of the problem isn't the only good news, Intel also has a new microcode ready for 13th Gen and 14th Gen Core processors (version: 0x125), for motherboard manufacturers and PC OEMs to encapsulate into UEFI firmware updates. This new microcode corrects the issue, which should restore stability of these processors at their normal performance. Be on the lookout for UEFI firmware (BIOS) updates from your motherboard vendor or prebuilt OEM.
Source: Igor's Lab
Add your own comment

107 Comments on Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm

#26
Airbrushkid
So how many of the cpu's have been affected by this? How many of these cpu's have been sold?
Posted on Reply
#27
Dragokar
It is only part of the issue as Intel tells as well.....the future Microcode update will tell us more. As long as vendors like i.e. Gigabyte are constantly publishing the wrong values in the uefi or we dont see widespread updates for the "old" 6xx series it is not done anyway, well not for me at least.
Posted on Reply
#28
chrcoluk
AirbrushkidSo how many of the cpu's have been affected by this? How many of these cpu's have been sold?
The microcode kind of confirm whats found in the reports its limited to CPUs that have thermal velocity boost from Raptor Lake onwards.
Posted on Reply
#29
iameatingjam
Well I for one am currently a happy camper. I was afraid my second 14700kf was dying. It started with an occasional bluescreen ( same way it started the first time). And weird things were happening, like I wake the computer up from sleep and all of a sudden all the windows would fly around the screen. Fast forward a couple days and my computer couldn't even get into windows unless I turned c-states off which basically means a fixed voltage and 40w idle power consumption. Also one time I was trying to log into windows and on the splash screen it was like somebody was doodling all over it with windows paint. Something was very wrong.

So I reseating the processor, noticed there was a couple microscoptic peices of debris on it, hoped that was reason, and then I finally updated my bios ( I am always hesitant to do this, since they often take things away as often as they give things). It upgraded my ucode to 123, which means no undervolting, but if it means a working computer, I'll take it. Besides, undervolting is really only helpful for benchmarks.

Now before I jinx myself, I've only been using my computer for like an hour since this happened so I don't know for sure now, but I've run some stress tests, and surely by now something would have happened, a blue screen, a random power off, the screen would randomly go interlaced (wtf is with that?). Hoping and praying I don't have to do another rma.

And yeah its pretty much settled in my mind now, next cpu will be AMD unless there's drastic
change to the enviroment.
Posted on Reply
#30
trsttte
DenverLet's get into conspiracy theory mode. I find it quite convenient to release a fix that degrades RPL performance now months before the release of Arrow Lake. Now I understand the +/- 10% margin of error in intel's slides.

Pat is a criminal genius. lol
It's just like all the specter and meltdown mitigations, reviews and benchmarks were done months ago, who cares if now they get a 10% or more performance drop after the mitigations.

And realistically speaking it doesn't matter all that much, it's fun and games but in the end it's just fun and games, if you want stability and mission critical you'll drop that another 10% further down and use a city workhorse instead of a racecar always in the redline.
Posted on Reply
#31
trparky
According to a report over at Techspot.com, Intel still doesn't know what's going on with the Core i9. My thoughts are that this is simply of symptom of Intel pushing a 15-year-old microarchitecture way past the breaking point.

At this point, I think Intel needs to recall every single last Core i9 ever sold and to issue refunds for selling what is a defective product.

Intel still doesn't know what is causing its i9 desktop chips to crash | TechSpot
Reports that Intel had found a fix originated with the German blog Igor's Lab. It claimed it obtained an internal document under NDA that said the instability's underlying cause was "an incorrect value in microcode algorithm associated with the eTVB (enhanced Thermal Velocity Boost) feature."

For a brief moment, Raptor Lake users believed Intel had a fix in the queue. Unfortunately, Intel said reports identifying the fault were incorrect. It is still trying to determine the cause.

"Contrary to recent media reports, Intel has not confirmed root cause and is continuing, with its partners, to investigate user reports regarding instability issues on unlocked Intel Core 13th and 14th generation (K/KF/KS) desktop processors," it said in a statement. "The microcode patch referenced in press reports fixes an eTVB bug discovered by Intel while investigating the instability reports. While this issue is potentially contributing to instability, it is not the root cause."
Posted on Reply
#32
InVasMani
trsttteIt's just like all the specter and meltdown mitigations, reviews and benchmarks were done months ago, who cares if now they get a 10% or more performance drop after the mitigations.

And realistically speaking it doesn't matter all that much, it's fun and games but in the end it's just fun and games, if you want stability and mission critical you'll drop that another 10% further down and use a city workhorse instead of a racecar always in the redline.
I mean people that buy hardware care when the reviewers don't spot these sorts of problems and when companies like Intel or AMD don't likewise. This issue shouldn't have slipped thru the cracks as much as it did by as many as it did. That said we've seen a lot of bugs undetected rear it's ugly head much later. They need to hurry up and pinpoint the source of the problem.
Posted on Reply
#33
Darmok N Jalad
InVasManiI mean people that buy hardware care when the reviewers don't spot these sorts of problems and when companies like Intel or AMD don't likewise. This issue shouldn't have slipped thru the cracks as much as it did by as many as it did. That said we've seen a lot of bugs undetected rear it's ugly head much later. They need to hurry up and pinpoint the source of the problem.
Makes me wonder. I think part of the problem is that these companies know the pressure reviewers face, and they just might be playing that to their advantage. Review sites want to meet the deadline, which just so happens to line up with the NDA lifting and the products going to retail. Companies can just tell the reviewers that these are pre-release samples and the bugs they find will get worked out with BIOS revisions. Any issues reviewers run into can just get chalked up to pre-release hardware and so, unless it's egregious, it's not worth much of a mention or it gets glossed over by the readers. If there are bugs and issues--especially those that push hardware to the edge of stability, well, the end user gets to endure them. Any firmware revisions that improve stability at the cost of performance will largely get overlooked, since these hardware reviews require multiple hours of testing and benchmarking. It would take a lot of time and effort to re-review RLR once this gets fixed, and by now the biggest part of the sales window has closed. All it really does is give Intel a bit of a black eye, but they can just tell us the next product they have around the corner will be better. So basically, we get these pre-release benchmarks, and the bugs and stuff "will be worked out later." It doesn't help that readers expect a day one review, so we might just be doing this to ourselves. Maybe these companies aren't playing fast and loose here, but it will certainly work out for Intel this time.
Posted on Reply
#34
matar
I just bought an i9-12900KS and skipped the13900k and i really wanted the 13900k but the issues turned me off and also i bought the 12900KS for $200 even NEW Sealed, so it was a great Deal and no known issues i hope its a nice upgrade from my beloved i9-10900KF .
Posted on Reply
#35
InVasMani
matarI just bought an i9-12900KS and skipped the13900k and i really wanted the 13900k but the issues turned me off and also i bought the 12900KS for $200 even NEW Sealed, so it was a great Deal and no known issues i hope its a nice upgrade from my beloved i9-10900KF .
It should be a bit especially for MT. The 12900KS is still pretty decent the E cores you didn't have as much fine control over I guess, but they don't offer much anyway in reality. Great chip at that price though bargain value.
Posted on Reply
#36
trparky
matarI just bought an i9-12900KS and skipped the13900k and i really wanted the 13900k but the issues turned me off and also i bought the 12900KS for $200 even NEW Sealed, so it was a great Deal and no known issues i hope its a nice upgrade from my beloved i9-10900KF .
What? No love for AMD?
Posted on Reply
#37
AusWolf
matarI just bought an i9-12900KS and skipped the13900k and i really wanted the 13900k but the issues turned me off and also i bought the 12900KS for $200 even NEW Sealed, so it was a great Deal and no known issues i hope its a nice upgrade from my beloved i9-10900KF .
I can't imagine why you would even look at 13th or 14th gen with a 12900K already in your system.
Posted on Reply
#38
A_macholl
I have i7 12700K and I was seriously thinking about changing to 14700K looking at about 50% better performance. I was always Intel hardfan but yeah, I'm soon changing to AMD platform. Putting CPU on motherboard with a crowbar or jimmy, CPU bending and now stability problems with Intel saying "It's not our fault, just the world around us is guilty". Is Intel a girl or what??
Posted on Reply
#39
chrcoluk
trparkyAccording to a report over at Techspot.com, Intel still doesn't know what's going on with the Core i9. My thoughts are that this is simply of symptom of Intel pushing a 15-year-old microarchitecture way past the breaking point.

At this point, I think Intel needs to recall every single last Core i9 ever sold and to issue refunds for selling what is a defective product.

Intel still doesn't know what is causing its i9 desktop chips to crash | TechSpot
I suspected this which is why in my first reply I put something along the lines of it might only reduce the incidents rather than wipe them out. Also why I still consider running boards at stock power configuration as a fix. (ac/dc,pl1,pl2,iccmax). Bear in mind though the microcode adjustment is similar to what you saying, it makes it so TVB doesnt kick in as often which is that push on the CPU you talking about, its an extra turbo mode that uses extreme voltages to grab a few hundred more mhz. I didnt like the idea of TVB and its one of the things that leaned me to an i7 chip.

So I suspect it will be microcode update, users update bios, issue goes away and everyone moves on albeit with some performance loss on the chips.
Posted on Reply
#40
AusWolf
trparkyAccording to a report over at Techspot.com, Intel still doesn't know what's going on with the Core i9. My thoughts are that this is simply of symptom of Intel pushing a 15-year-old microarchitecture way past the breaking point.

At this point, I think Intel needs to recall every single last Core i9 ever sold and to issue refunds for selling what is a defective product.

Intel still doesn't know what is causing its i9 desktop chips to crash | TechSpot
Somebody prove me if I'm wrong, but Intel and AMD seem to be polar opposites when it comes to treating their platforms.
Intel starts the platform with something good, then pushes it to the breaking point by the end.
AMD releases whatever they can, and then refine it to perfection by the platform's end.

To illustrate, AMD started AM4 with Zen, which was okay, but it really matured with Zen 3. Now, they started AM5 with a let-down for many, we'll see how Zen 5 and 6 catch up.
Intel had LGA-1151 with Skylake which is the pinnacle of the 4-core era, if you don't count Kaby Lake which needed a new chipset for some reason, despite being on the same socket.
Then, LGA-1200 had Comet Lake with 10 cores, and then Intel shifted back a gear with Rocket Lake with 8 cores.
Now, we have LGA-1700 with Alder Lake, which was okay based on what I heard about it, and now Raptor Lake refresh with all these problems.
Posted on Reply
#41
chrcoluk
AusWolfSomebody prove me if I'm wrong, but Intel and AMD seem to be polar opposites when it comes to treating their platforms.
Intel starts the platform with something good, then pushes it to the breaking point by the end.
AMD releases whatever they can, and then refine it to perfection by the platform's end.

To illustrate, AMD started AM4 with Zen, which was okay, but it really matured with Zen 3. Now, they started AM5 with a let-down for many, we'll see how Zen 5 and 6 catch up.
Intel had LGA-1151 with Skylake which is the pinnacle of the 4-core era, if you don't count Kaby Lake which needed a new chipset for some reason, despite being on the same socket.
Then, LGA-1200 had Comet Lake with 10 cores, and then Intel shifted back a gear with Rocket Lake with 8 cores.
Now, we have LGA-1700 with Alder Lake, which was okay based on what I heard about it, and now Raptor Lake refresh with all these problems.
I remember saying sometime in the past we got HW vendors releasing new gen of products for sake of releasing, instead of waiting 3,4,5 years whatever is required when there is a proper replacement product, they seem to now release on a "schedule" I suppose to keep something new to market. This means if something is not ready they then risk either releasing a buggy product or pushing an older product with aggressive factory overclock.

Think back to sandy bridge, they could have kept that as latest for at least a few more years then maybe skip straight from that to skylake or something (whatever the first DDR4 platform was). No need to release ivy bridge and haswell in between.

Alder lake probably should have remained the latest chip out of the current chipset, but again that marketing pressure, to release "something".

AMD's issue with it in reverse would suggest they are releasing products before they are ready, the issues with things like very long post times in my opinion shouldnt be in a released product. AM4 we know there is life left in it, so AM5 perhaps could have been delayed, so my view is right we should probably have something like 5800X3D against something like the 12700k. If 9000 series chips fix the issues that the 70000 had then it would be I guess the jump would be from 5000 series to what will be the 9000 series chips and jump from Alder Lake to Arrow lake, so both sides having a much longer period of manufacturing and slower release cycle.
Posted on Reply
#42
AusWolf
chrcolukI remember saying sometime in the past we got HW vendors releasing new gen of products for sake of releasing, instead of waiting 3,4,5 years whatever is required when there is a proper replacement product, they seem to now release on a "schedule" I suppose to keep something new to market. This means if something is not ready they then risk either releasing a buggy product or pushing an older product with aggressive factory overclock.

Think back to sandy bridge, they could have kept that as latest for at least a few more years then maybe skip straight from that to skylake or something (whatever the first DDR4 platform was). No need to release ivy bridge and haswell in between.

Alder lake probably should have remained the latest chip out of the current chipset, but again that marketing pressure, to release "something".

AMD's issue with it in reverse would suggest they are releasing products before they are ready, the issues with things like very long post times in my opinion shouldnt be in a released product. AM4 we know there is life left in it, so AM5 perhaps could have been delayed, so my view is right we should probably have something like 5800X3D against something like the 12700k. If 9000 series chips fix the issues that the 70000 had then it would be I guess the jump would be from 5000 series to what will be the 9000 series chips and jump from Alder Lake to Arrow lake, so both sides having a much longer period of manufacturing and slower release cycle.
Totally agreed. Let's also not forget people who upgrade every generation despite the fact that they don't need to at all. "5% improvement, whoaoaoaoa!!!" Marketing works, I guess.
Posted on Reply
#43
Daven
trparkyWhat? No love for AMD?
I also wonder why some will never consider buying AMD. It’s pretty easy to use review sites to know what’s best at any given time.

I went from Pentium II and III to K7 and K8 to Haswell and Coffee Lake and finally Zen 4. I completely skipped Netburst, Bulldozer and P/E core hybrids. Again pretty easy stuff to figure out.
Posted on Reply
#44
Crackong
Translation : Sorry our CPUs just can't keep up without super aggressive on-the-edge tuning that breaks the CPUs IRL.
Posted on Reply
#45
matar
AusWolfI can't imagine why you would even look at 13th or 14th gen with a 12900K already in your system.
i just upgraded a few days ago YES i was looking for a 13900k not the 14900k because its the same as 13900k best i found is $400 for the 13900K, But when i saw the 12900KS NEW factory sealed for $200 out the door and and it was just 1 mile away pickup then i said ok thats the way i will go.
Posted on Reply
#46
InVasMani
129000KS is fairly on par with a 14600K or 13700K and for $200's that's really not bad terrible value. Anyways Intel defiantly has been pushing it's chips to the edge for awhile and seems like it's caught up to them. It really doesn't help overall stability when everything gets pushed to it knees simultaneously and heat build up permeates. That's a bigger issue on lesser quality boards and weaker cooling setups as well.

I suspect it's not all just one root cause or factor really, but rather mixture of things contributing to instability across different systems. I've still never shied away from pointing out that it certainly appears like Intel's been pushing it's chips too far heat and power relative to what's ideal. They've been trying to play catch up, but it feels like the red is on them now in the mess they might've created here. Intel unfortunately suffered far too much complacency after bulldozer and cornering the CPU market for around a decade with no competition in it's sights.

I hope they can fix the hardware issue with a reasonable solution, but the jury is out on that one. I'll defiantly have to take into strong consideration how they handle this before considering Battle Mage or not. AMD's next GPU is starting to look pretty interesting as well and with them placing a stronger focus on lower end and mid range I view that as a positive for those segments of graphics chip upgrades. I'm happy to see a bit stronger competitive push at that end of the GPU market for consumers given how lacking it's been.
Posted on Reply
#47
Airbrushkid
chrcolukThe microcode kind of confirm whats found in the reports its limited to CPUs that have thermal velocity boost from Raptor Lake onwards.
So every 14900 they sold is affected? So how many where sold to consumers roughly? I'm a little lost here. I was thinking of buying one. K, KF, KS? I don't over clock ever.
Posted on Reply
#48
InVasMani
Doesn't it also apply to all or most of Raptor Lake lineup not just the 14900, but could be wrong on that though thought I heard something like that earlier on about the situation. The 14900 series is just the worst impacted by it and given it's pushed hardest to begin with seems to make sense why that is. It's not simply frequency pushed that it's being pushed harder on 14900 series it's the overall heat soak permeation of operating that many cores within the scope of the overall design then mix in different cooling setups and OC memory adding additional stress plus MB quality and bios and you've got a bit of a mess.
Posted on Reply
#49
chrcoluk
InVasManiDoesn't it also apply to all or most of Raptor Lake lineup not just the 14900, but could be wrong on that though thought I heard something like that earlier on about the situation. The 14900 series is just the worst impacted by it and given it's pushed hardest to begin with seems to make sense why that is. It's not simply frequency pushed that it's being pushed harder on 14900 series it's the overall heat soak permeation of operating that many cores within the scope of the overall design then mix in different cooling setups and OC memory adding additional stress plus MB quality and bios and you've got a bit of a mess.
Only the i9's have thermal velocity boost. i7's and below have the standard turbo boost.
AirbrushkidSo every 14900 they sold is affected? So how many where sold to consumers roughly? I'm a little lost here. I was thinking of buying one. K, KF, KS? I don't over clock ever.
Depends what you mean by affected, the microcode is a thing on every CPU, but the microcode update should effectively "patch" the problem.

Does every single 14900 show the symptom? Dont know, I think silicon lottery may have a bearing on it.

From where I sit if you was to buy a 14900(k) you have a few options if you paranoid about it.
1 - Disable TVB in the bios. You lose some potential peak performance. Will still have standard turbo boost.
2 - Apply intel stock settings, May lose performance in heavy threaded loads, hit limits much easier.
3 - Update to the latest microcode, May lose some peak performance but not as much as disabling TVB.
Posted on Reply
#50
InVasMani
chrcolukOnly the i9's have thermal velocity boost. i7's and below have the standard turbo boost.


Depends what you mean by affected, the microcode is a thing on every CPU, but the microcode update should effectively "patch" the problem.

Does every single 14900 show the symptom? Dont know, I think silicon lottery may have a bearing on it.

From where I sit if you was to buy a 14900(k) you have a few options if you paranoid about it.
1 - Disable TVB in the bios. You lose some potential peak performance. Will still have standard turbo boost.
2 - Apply intel stock settings, May lose performance in heavy threaded loads, hit limits much easier.
3 - Update to the latest microcode, May lose some peak performance but not as much as disabling TVB.
I thought I had seen i7's mentioned as well when the subject first surfaced, but maybe not. If the problem is TVB that's a bit of a bitter pill to swallow for i9 owners. It's better overall though for Intel if it was only isolated to i9's I suppose though those are also the most expensive chips they sell and hardest to produce in the event of a recall rather than a patch.
Posted on Reply
Add your own comment
Nov 29th, 2024 04:59 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts