Monday, April 29th 2024
Intel Statement on Stability Issues: "Motherboard Makers to Blame"
A couple of weeks ago, we reported on NVIDIA directing users of Intel's 13th Generation Raptor Lake and 14th Generation Raptor Lake Refresh CPUs to consult Intel for any issues with system stability. Motherboard makers, by default, often run the CPU outside of Intel's recommended specifications, overvolting the CPU through modifying voltage curves, automatic overclocks, and removing power limits.
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Source:
Igor's Lab
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Intel has observed that this issue may be related to out of specification operating conditions resulting in sustained high voltage and frequency during periods of elevated heat.
Analysis of affected processors shows some parts experience shifts in minimum operating voltages which may be related to operation outside of Intel specified operating conditions.
While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards.
Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:Intel requests system and motherboard manufacturers to provide end users with a default BIOS profile that matches Intel recommended settings.
- Disabling Current Excursion Protection (CEP)
- Enabling the IccMax Unlimited bit
- Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
- Additional settings which may increase the risk of system instability:
- Disabling C-states
- Using Windows Ultimate Performance mode
- Increasing PL1 and PL2 beyond Intel recommended limits
Intel strongly recommends customer's default BIOS settings should ensure operation within Intel's recommended settings.
In addition, Intel strongly recommends motherboard manufacturers to implement warnings for end users alerting them to any unlocked or overclocking feature usage.
Intel is continuing to actively investigate this issue to determine the root cause and will provide additional updates as relevant information becomes available.
Intel will be publishing a public statement regarding issue status and Intel recommended BIOS setting recommendations targeted for May 2024.
272 Comments on Intel Statement on Stability Issues: "Motherboard Makers to Blame"
Why did everyone suddently scramble to introduce BIOS update with "Intel Baseline Profile"? Isn't that something that should be there by default when buyers purchase motherboards?
board partners are doing this because it sells boards. How is MSI going to sell you an $1100 meg if it performs the same as a z760? Do you really want nehamic audio? No.
if you want to get on your “intel bad” conspiracy soap box for mobo makers being negligent wait until you stumble across these voltage increases over time steadily selling more coolers.
you people preach intel is big bad but the SAME mobo manufacturers that are bumping up the voltages were welding IHSs to AMD coolers months ago.
make up your mind.
Atleast AMD didn't stoop that low. Intel reeks of desperation now and I guess the defensive attitude stems from that. How far they've fallen..
Anyway, enough of the digression, why do all the Intel owners want to change the subject and dodge the subject at hand.
Anyway, motherboards having silly default settings isn't a new thing. A few years ago I had an Aorus B560 motherboard mated to an i9-11900 that wouldn't be stable under semi-default settings (with just power limits increased over the 65W default. Not much else to tweak on a B-series motherboard and non-K CPU), causing errors or bluescreens during intensive benchmarks. At first I thought it was the CPU, but eventually I found it was just a matter of voltage sagging too much under load. It turned out that the motherboard had rather weak default loadline and LLC settings, that while on one hand mitigated thermal throttling (to some extent), on another caused stability issues under load.
As PC World debate on youtube says: "Mummy and daddy need to come in to clean this mess once and for all".
So unless this Intel Baseline bios update also fixes the VF curve to not slam high voltages in at idle...the new profile wouldn't have prevented my 14900K from degrading even if it existed from day 1. What Intel seems to be fixing is the scenario of "I set everything to Auto and my CPU overheats".
I must say, judging from the open Intel spec document I've shown at #157
Only maximum values were specified by Intel.
While 'Capping maximum values' sounds okay.
Not all values are equal, certain things get ugly when they are ' too low', such as current / resistance / load line calibration
The problem (random crash in testing/UE5) we are facing right now seems to be caused by 'Not enough current / voltage during heavy workload'
Since the Intel document only specified the maximum current/voltage, which is 307A / 1.72V for a normal K SKU
No minimum values, nor typical values were supplied by Intel.
So the range is 0-307A / 0-1.72V
'Being too low' is actually...well.. 'Within Spec' . True.
Things get dirty when everyone wants the last bit of juice getting squeezed out-of-box.
Really missed those days when 20-30% even 50% overclocking headroom was possible.
But noted the word 'Individually calibrated' .
It means each CPU has their own voltage/frequency table.
That means a 'low bin' CPU could have voltage shots to the moon to achieve the said frequency.
It is quite unfair to ask the MB vendors to predict what a 'low bin' CPU would behave.
Since the lack of information supplied by Intel, and they have to try & error all those blank values, with Intel encouraging them to make LLC lower (lower voltage).
When low LLC meets 'low bin' CPU, bad things happened.
Of course, the existence of high/low bins implies that some CPUs will be able to boost longer before thermal throttling, some less. However, it isn't the motherboard manufacturer's job to make things "fair" for unlucky customers by lowering voltages (at the cost of instabilities).
Then it all comes back to the paragraph I pointed out at #157
Intel themselves encouraged the MB vendors to use 'Superior board design with lower LLC to achieve better performance'
That's why I said it is a 50/50 blame on Intel/MB vendor.
It is quite obvious that Intel wants the MB vendor to use lower LLC and show better performance.
And the MB vendors get more sales when CPU performed better on their boards.
Technically they all act within-spec, until some low bin CPUs come in......
Maybe, MB vendors could have tested a few more hundred CPU to cook up a better LLC table.
Maybe, Intel themselves shouldn't allow such low bin CPU to be branded as i9 in a first place.
In principle it could be configured so that the voltage supplied under load is exactly the one requested by the CPU, but there's apparently barely any testing going on by motherboard manufacturers, with the same settings being used across different models having different voltage regulators and electrical characteristics.
In Intel spec document it is described that MB vendor should measure and set their own values. (Since no such value is provided by Intel)
And some 'Superior' / 'Improve' buzzword are also presented to encourage MB vendors to use a 'Shallower AC load line' design for more performance.
So naturally, MB vendors would favor a low AC load line default setting.
Since there is no reference value provided by Intel.
MB vendors had to cook up their own values, by testing the ES CPUs provided by Intel.
IDK how many they've tested.
But juding from the reality, it doesn't cover the whole silicon lottery spectrum.
That was a fun one trying to troubleshoot. Eventually I locked it to 5.7 or 5.5 max regardless of core active count and called it a day, and it didn't crash after that. And then it degraded and had to get RMA'd anyways.
So I basically said "f this noise" and put a non-K in. No more TVB to go to unrealistic clocks or weird all core frequency stuff that is beyond specs. And seems to idle at a sane voltage so probably won't degrade. I have the replacement 14900K here so I'm tempted to unseal it and try the new Asus bios to see if they fixed that, but I've done so many CPU remounts and swaps and stuff fixing problems that I'm kinda over it.