Monday, April 29th 2024
Intel Statement on Stability Issues: "Motherboard Makers to Blame"
A couple of weeks ago, we reported on NVIDIA directing users of Intel's 13th Generation Raptor Lake and 14th Generation Raptor Lake Refresh CPUs to consult Intel for any issues with system stability. Motherboard makers, by default, often run the CPU outside of Intel's recommended specifications, overvolting the CPU through modifying voltage curves, automatic overclocks, and removing power limits.
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Source:
Igor's Lab
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Intel has observed that this issue may be related to out of specification operating conditions resulting in sustained high voltage and frequency during periods of elevated heat.
Analysis of affected processors shows some parts experience shifts in minimum operating voltages which may be related to operation outside of Intel specified operating conditions.
While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards.
Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:Intel requests system and motherboard manufacturers to provide end users with a default BIOS profile that matches Intel recommended settings.
- Disabling Current Excursion Protection (CEP)
- Enabling the IccMax Unlimited bit
- Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
- Additional settings which may increase the risk of system instability:
- Disabling C-states
- Using Windows Ultimate Performance mode
- Increasing PL1 and PL2 beyond Intel recommended limits
Intel strongly recommends customer's default BIOS settings should ensure operation within Intel's recommended settings.
In addition, Intel strongly recommends motherboard manufacturers to implement warnings for end users alerting them to any unlocked or overclocking feature usage.
Intel is continuing to actively investigate this issue to determine the root cause and will provide additional updates as relevant information becomes available.
Intel will be publishing a public statement regarding issue status and Intel recommended BIOS setting recommendations targeted for May 2024.
272 Comments on Intel Statement on Stability Issues: "Motherboard Makers to Blame"
150/320
320/320
All these numbers were mentioned in your posted materials /datasheet/whatever
Just pick one, or give us a number that you somehow 'understand' from all the convoluted Intel spec.
We will see if it is right, or just your 'speculation'.
Is that all you wanted?
Could also include an Eco mode ala Ryzen with a, say, 125W overall limit.
There's nothing inherently wrong with having multiple sets of values for different performance/efficiency targets.
The issue, is board partners not using these sets of values, doing their own thing, then Intel picking up the tab when there's instability.
It seems that even the motherboard manufacturers have a hard time figuring these out, so Asus would have PL1 = PL2 as baseline, while Gigabyte had 188W basline with 1.7v voltage loadline calibration.. It was my mistake, Thanks for pointing it out.
I agree that the best thing motherboard manufacturers could do would simply be to have a few default profiles directly using the values off the Intel Datasheet, which has various baseline, normal and extreme presets already dialled in (and validated). Despite some people seeming to think that this endeavour would be too difficult for manufacturers to figure out.
The extra "AI OC" or whatever marketing wants to call fiddling with settings and overclocks that Intel hasn't validated for every CPU bin of the SKU should still be an option, but not the default, and with a UI warning as Intel is suggesting in their memo.
We'll see how many reviewers publish and advertise an update to all the reviews made when the parts were launched at least to flag the situation even if no number correction. Not really seeing this news on too many front pages today but it's also a good litmus test for my personal future reading preferences. Will keep an eye out even if I'm generally very behind the times so I almost never buy current generation. But I was still burned by super optimistic day-1 reviews which were never updated to account for the real life performance losses as the day-1 "optimizations" aimed at getting flashy numbers had to be turned off in the real life.
P.S. And I'm still not entirely sure this is just a matter of "staying standard", I'm fairly certain there are a lot more hidden changes under the hood that contribute to this situation, beyond just the one power topic.
I think AMD's ECO mode isn't for maximum stability, but for energy efficiency.
On the other hand, 'Intel Baseline Spec' is advertised to be the 'Safest & most Stable' profile, not for energy efficiency.
I don’t accept your interpretation that Intel did not approve of these default settings. Intel says nothing about past compliance in their statement. They just give guidance going forward using words like requests and recommends (all present tense). So you speculated that Intel told these manufacturers NOT to do this in the past and they disregarded. This is not based on any facts and is just your opinion of how a company like Intel ought to act.
From ASUS' patch notes.
Interesting that ASUS is referring only to the Intel Baseline Profile spec as the factory default, when there are several other standard and "extreme" profiles too.
When did I do this?
Once upon a time, the Intel branded motherboards (Foxconn OEM), while lacking the "bells and whistles" of the other MB vendors, were working out of the box and were definition of stability. Now both AMD and Intel, made the supervision so loose, and the QA of MB manufacturers is so bad, that they both have to pay with their reputational damage. They just trying to sell the snake oil ASAP, and at all costs. How else they could get money, if the rival CPUs do the same job at twice less energy usage?
Eventually, the Core i is an established brand, and the Core Ultra, may introduce some uncertainty. So, that's why they are so desparate.
Just thoughts aloud.
All we've got now is Gigabyte and Asus
The Intel Baseline Profile is just one of several options. None of which seem to be used by default out of the box, even after the "Intel baseline profile" BIOS updates vendors have made, still deviations and made up numbers.
Since 14900KS had a PL1/PL2 = 150/320, which is differ from regular 14900K's 125/253
If they had the same baseline profile, it will render them basically the same SKU.
It's the job of the managers, systems engineers/coders etc. at these companies to understand these things. Skimming through the datasheet provided by Intel, it's not that difficult for an end user to plug in these values to their BIOS, so why is it difficult for a huge international company to copy and paste values? We've seen that even with the "Intel baseline profile" BIOS updates, values still do not line up with the first party Intel specification, which is nicely summarized in a few tables. Explicitly explained with references and full details in a comprehensive document. What more do partners need to adhere to spec?
I still think there is a lot of jumping on this topic to attack Intel, and not enough people criticising the fact that board partners who should know better are possibly comically incompetent to the point of not being able to copy and paste several numbers from a datasheet, or potentially still trying to gain competitive advantage by using the wrong values.
After reading the documentation posted here, next time I reboot I am changing pl2 to 125w.
Buildzoid on his video checked specs and concluded sustained power was 125w as well, he also has the opinion he is not convinced the baseline mode on his gigabyte board was from intel or something gigabyte whipped together and has the opinion we will probably never know.
Basically: if you want a stable platform nowadays, just don't buy latest-generation gear. "Settle" for like, a Zen 3 or Rocket Lake platform with a fully updated BIOS. The 320 W setting is considered to be an "Extreme Power Profile" that is exclusive to the Core i9-12900KS, 13900KS and 14900KS SKUs, iirc. Otherwise you're correct.
My view is its unlikely they run things by Intel.
We also currently have a baseline mode on Asus that keeps 253w set. I think that wasnt ran by intel.
Asus also setting voltages that was blowing up AMD chips, dont think that was ran by AMD.
Most board vendors have just 1 or 2 bios dev as revealed by the guy who used to work for EVGA. Its not a large professional operation.
ASUS just happened to have aggressive enough tuning that the problem was further exacerbated on some of their boards.
IIRC the problem was automatic voltage algorithms in the AGESA that linked memory voltage and memory controller voltage. So engaging EXPO would push internal chip voltages past safe limits.
This was an issue particularly with X3D chips due to lower voltage tolerances, but also impacted standard Zen 4 chips.
What leaves everyone's mouth bitter is the CPUs aren't blowing up, no RMA, but confirmed decreased performance, and no clear solution is provided, yet. It is Intel's CPU, it is their job to make sure the motherboard vendors having a correct 'Default' profile so it works 100% of the time.
This lack of communication alone is a big issue and is one of the Intel biggest fault.
And, if your CPUs are this fragile, measurements should be taken to 'prevent' the partners further messing it up.
Like AMD, with their X3D voltage issue, they forced new voltage setting very quickly and RMA every affected case.
Like Nvidia, Nvidia does a great job make sure the AIB cannot mess up their GPUs and, if something's up like the 12vhpwr issue, Nvida took the responsibility and took care every affected case.
Please noted in the above mentioned cases,
Although customers do blame the AIB partners,
But AMD/Nvidia themselves didn't actively placed the blame on their partners.
They just went in, solved the problem, and get out ASAP.
if they can do it, why not Intel ?