Friday, June 14th 2024
Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm
Intel has identified the root cause for stability issues being observed with certain high-end 13th- and 14th Gen Core "Raptor Lake" processor models, which were causing games and other compute-intensive applications to randomly crash. When the issues were first identified, Intel recommended a workaround that would reduce core-voltages and restrict the boost headroom of these processors, which would end up with reduced performance. The company has apparently discovered the root cause of the problem, as Igor's Lab learned from confidential documents.
The documents say that Intel isolated the problem to a faulty value in the microcode's end of the eTVB (enhanced thermal velocity boost) algorithm. "Root cause is an incorrect value in a microcode algorithm associated with the eTVB feature. Implication Increased frequency and corresponding voltage at high temperature may reduce processor reliability. Observed Found internally," the document says, mentioning "Raptor Lake-S" (13th Gen) and "Raptor Lake Refresh-S" (14th Gen) as the affected products.The company goes on to elaborate on the issue in its Failure Analysis (FA) document:
Source:
Igor's Lab
The documents say that Intel isolated the problem to a faulty value in the microcode's end of the eTVB (enhanced thermal velocity boost) algorithm. "Root cause is an incorrect value in a microcode algorithm associated with the eTVB feature. Implication Increased frequency and corresponding voltage at high temperature may reduce processor reliability. Observed Found internally," the document says, mentioning "Raptor Lake-S" (13th Gen) and "Raptor Lake Refresh-S" (14th Gen) as the affected products.The company goes on to elaborate on the issue in its Failure Analysis (FA) document:
Failure Analysis (FA) of 13th and 14th Generation K SKU processors indicates a shift in minimum operating voltage on affected processors resulting from cumulative exposure to elevated core voltages. Intel analysis has determined a confirmed contributing factor for this issue is elevated voltage input to the processor due to previous BIOS settings which allow the processor to operate at turbo frequencies and voltages even while the processor is at a high temperature. Previous generations of Intel K SKU processors were less sensitive to these type of settings due to lower default operating voltage and frequency.Identifying the root cause of the problem isn't the only good news, Intel also has a new microcode ready for 13th Gen and 14th Gen Core processors (version: 0x125), for motherboard manufacturers and PC OEMs to encapsulate into UEFI firmware updates. This new microcode corrects the issue, which should restore stability of these processors at their normal performance. Be on the lookout for UEFI firmware (BIOS) updates from your motherboard vendor or prebuilt OEM.
107 Comments on Intel Isolates Root Cause of Raptor Lake Stability Issues to a Faulty eTVB Microcode Algorithm
So I reseating the processor, noticed there was a couple microscoptic peices of debris on it, hoped that was reason, and then I finally updated my bios ( I am always hesitant to do this, since they often take things away as often as they give things). It upgraded my ucode to 123, which means no undervolting, but if it means a working computer, I'll take it. Besides, undervolting is really only helpful for benchmarks.
Now before I jinx myself, I've only been using my computer for like an hour since this happened so I don't know for sure now, but I've run some stress tests, and surely by now something would have happened, a blue screen, a random power off, the screen would randomly go interlaced (wtf is with that?). Hoping and praying I don't have to do another rma.
And yeah its pretty much settled in my mind now, next cpu will be AMD unless there's drastic
change to the enviroment.
And realistically speaking it doesn't matter all that much, it's fun and games but in the end it's just fun and games, if you want stability and mission critical you'll drop that another 10% further down and use a city workhorse instead of a racecar always in the redline.
At this point, I think Intel needs to recall every single last Core i9 ever sold and to issue refunds for selling what is a defective product.
Intel still doesn't know what is causing its i9 desktop chips to crash | TechSpot
So I suspect it will be microcode update, users update bios, issue goes away and everyone moves on albeit with some performance loss on the chips.
Intel starts the platform with something good, then pushes it to the breaking point by the end.
AMD releases whatever they can, and then refine it to perfection by the platform's end.
To illustrate, AMD started AM4 with Zen, which was okay, but it really matured with Zen 3. Now, they started AM5 with a let-down for many, we'll see how Zen 5 and 6 catch up.
Intel had LGA-1151 with Skylake which is the pinnacle of the 4-core era, if you don't count Kaby Lake which needed a new chipset for some reason, despite being on the same socket.
Then, LGA-1200 had Comet Lake with 10 cores, and then Intel shifted back a gear with Rocket Lake with 8 cores.
Now, we have LGA-1700 with Alder Lake, which was okay based on what I heard about it, and now Raptor Lake refresh with all these problems.
Think back to sandy bridge, they could have kept that as latest for at least a few more years then maybe skip straight from that to skylake or something (whatever the first DDR4 platform was). No need to release ivy bridge and haswell in between.
Alder lake probably should have remained the latest chip out of the current chipset, but again that marketing pressure, to release "something".
AMD's issue with it in reverse would suggest they are releasing products before they are ready, the issues with things like very long post times in my opinion shouldnt be in a released product. AM4 we know there is life left in it, so AM5 perhaps could have been delayed, so my view is right we should probably have something like 5800X3D against something like the 12700k. If 9000 series chips fix the issues that the 70000 had then it would be I guess the jump would be from 5000 series to what will be the 9000 series chips and jump from Alder Lake to Arrow lake, so both sides having a much longer period of manufacturing and slower release cycle.
I went from Pentium II and III to K7 and K8 to Haswell and Coffee Lake and finally Zen 4. I completely skipped Netburst, Bulldozer and P/E core hybrids. Again pretty easy stuff to figure out.
I suspect it's not all just one root cause or factor really, but rather mixture of things contributing to instability across different systems. I've still never shied away from pointing out that it certainly appears like Intel's been pushing it's chips too far heat and power relative to what's ideal. They've been trying to play catch up, but it feels like the red is on them now in the mess they might've created here. Intel unfortunately suffered far too much complacency after bulldozer and cornering the CPU market for around a decade with no competition in it's sights.
I hope they can fix the hardware issue with a reasonable solution, but the jury is out on that one. I'll defiantly have to take into strong consideration how they handle this before considering Battle Mage or not. AMD's next GPU is starting to look pretty interesting as well and with them placing a stronger focus on lower end and mid range I view that as a positive for those segments of graphics chip upgrades. I'm happy to see a bit stronger competitive push at that end of the GPU market for consumers given how lacking it's been.
Does every single 14900 show the symptom? Dont know, I think silicon lottery may have a bearing on it.
From where I sit if you was to buy a 14900(k) you have a few options if you paranoid about it.
1 - Disable TVB in the bios. You lose some potential peak performance. Will still have standard turbo boost.
2 - Apply intel stock settings, May lose performance in heavy threaded loads, hit limits much easier.
3 - Update to the latest microcode, May lose some peak performance but not as much as disabling TVB.