Monday, April 29th 2024
Intel Statement on Stability Issues: "Motherboard Makers to Blame"
A couple of weeks ago, we reported on NVIDIA directing users of Intel's 13th Generation Raptor Lake and 14th Generation Raptor Lake Refresh CPUs to consult Intel for any issues with system stability. Motherboard makers, by default, often run the CPU outside of Intel's recommended specifications, overvolting the CPU through modifying voltage curves, automatic overclocks, and removing power limits.
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Source:
Igor's Lab
Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Intel has observed that this issue may be related to out of specification operating conditions resulting in sustained high voltage and frequency during periods of elevated heat.
Analysis of affected processors shows some parts experience shifts in minimum operating voltages which may be related to operation outside of Intel specified operating conditions.
While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards.
Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:Intel requests system and motherboard manufacturers to provide end users with a default BIOS profile that matches Intel recommended settings.
- Disabling Current Excursion Protection (CEP)
- Enabling the IccMax Unlimited bit
- Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
- Additional settings which may increase the risk of system instability:
- Disabling C-states
- Using Windows Ultimate Performance mode
- Increasing PL1 and PL2 beyond Intel recommended limits
Intel strongly recommends customer's default BIOS settings should ensure operation within Intel's recommended settings.
In addition, Intel strongly recommends motherboard manufacturers to implement warnings for end users alerting them to any unlocked or overclocking feature usage.
Intel is continuing to actively investigate this issue to determine the root cause and will provide additional updates as relevant information becomes available.
Intel will be publishing a public statement regarding issue status and Intel recommended BIOS setting recommendations targeted for May 2024.
272 Comments on Intel Statement on Stability Issues: "Motherboard Makers to Blame"
The spec values for PL1/PL2 are in the Datasheet.
14900K Extreme profile and 14900K non-Extreme are the same. Extreme profile is not and should not be applicable for Base Profile. Oh sweet summer child. What makes you think only Intel's power delivery has many knobs and dials? :D
I have good reason to believe that I was affected by said stability problems (at least with the last 1F BIOS, and there's some likelihood it was my fault as well), since updating to the 1G BIOS I made sure I triple checked every single one of my settings and memory subtimings, and I have not experienced any BSOD's since. Something funny though, whenever my computer was about to crash, the scrambled graphics bug on the Nvidia drivers that affected Chromium would trigger something fierce. Ultimately this is my sole problem with Intel's response. They should be far more proactive and shield themselves less from any potential blame. But then again, if you look at who's running Intel's PR at the moment, you're going to understand this stance. It's the same one AMD took until very, very recently ;)
Ian Cutress: One of the things we’ve seen with the parts that we review is that we’re taking consumer or workstation level motherboards from the likes of ASUS, ASRock, and such, and they are implementing their own values for that PL2 limit and also the turbo window – they might be pushing these values up until the maximum they can go, such as a (maximum) limit of 999 W for 4096 seconds. From your opinion, does this distort how we do reviews because it necessarily means that they are running out of Intel defined spec?
Guy Therien: Even with those values, you're not running out of spec, I want to make very clear – you’re running in spec, but you are getting higher turbo duration.
We’re going to be very crisp in our definition of what the difference between in-spec and out-of-spec is. There is an overclocking 'bit'/flag on our processors. Any change that requires you to set that overclocking bit to enable overclocking is considered out-of-spec operation. So if the motherboard manufacturer leaves a processor with its regular turbo values, but states that the power limit is 999W, that does not require a change in the overclocking bit, so it is in-spec.
Source: www.anandtech.com/show/14582/talking-tdp-turbo-and-overclocking-an-interview-with-intel-fellow-guy-therien
The issue is that the board partners also disable many of the Intel boost algorithms, C states etc. This one change doesn't change the voltage curve much, i've tested it on many builds I have done, hence falling under the "additional settings that may increase the risk" category.
These settings are quite significant -
- Disabling Current Excursion Protection (CEP)
- Enabling the IccMax Unlimited bit
- Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
The crashes are from voltage going too high or too low. This can also be exacerbated by VRMs failing to keep up with boost clock changes, and over/undershooting voltage targets.Modifying the target frequency (MCE all core boost etc and other names for this), voltage LLC, and other settings will adjust the voltage curve outside of spec.
2 minutes in and Buildzoid already shown something interesting, the optimized default of ASUS APEX board BIOS 1202 was PL1/PL2 = 253/4095
That's is like..insane and it consumes 360W in R15 and crash R15
The baseline default on the APEX seems a lot resonable than Gigabyte, Buildzoid's 14900k runs 10% faster than the Gigabyte's baseline default.
i9 CPUs listed as baseline power of 150w (which seems to be the KS models), have 253/253 as perf spec. Extreme config of 320/320.
8+8 and 8+16 models which have 125w listed as base power have 253/125 as perf spec. But also have an extreme config as 253/253.
Baseline spec seems to be 188/125.
Document here, originally posted by @dgianstefani
www.intel.com/content/www/us/en/content-details/743844/13th-generation-intel-core-and-intel-core-14th-generation-processors-datasheet-volume-1-of-2.html
Good on Hardware Unboxed to lay it down on intel and they absolutely deserve it. It's a fact that they knew exactly what the board makers were doing, and did nothing about it but rather encouraged the behaviour. I remember some of the senior reviewers a few years back were pretty confused about the whole power limits thing, the ambiguity and lack of action on intel's part. I know for a fact Ian was, and there were some podcasts where they spoke about it in more detail and people even reached out to intel but they had absolutely zero issues with board partners and everything was 'within spec'.
You can't really defend intel here and just say "oh they should have enforced the board manufacturers more that's their only fault". That's really not the only fault there is it. When people started commenting on other issues, you admitted that the PL1=PL2 was not defined properly earlier and recently resurfaced. There are many more cracks in Intel's spec, and lots of ambiguity and looseness in their 'guide' which is an abhorrent mess. Most of it is covered by HWU.
Just FYI, there's no speculation on Steve's part and I 100% agree with him.
He also raises a point which is bad news, previously if you had a lottery loser chip that wasnt stable out of the box you would probably get a new one with RMA, now they might refuse swap if its stable on baseline.
I simply don’t believe that Intel didn’t know this wasn’t happening all along. They sure seem to have opened the door for this and just left themselves an out for when this finally came home to roost. They got a few generations of better-looking benchmarks out of the deal.
Intel just needs to take ownership of not enforcing proper defaults, but that will only cost them in the long run if they continue down their current design path of allowing for insane power limits. I remember when this all started, where Intel CPUs lacked in performance at default settings, but took off once you unlocked the power limits. It seems it only took one generation for those defaults to start getting ignored.
RMA the CPU and the tester test it on a Gigabyte.....
Now that user get stuck with a 'No problem' CPU which doesn't work on his/her MB on hand...
2. the profiles in the doc are,
2a. pl1=125w & pl2=253w & iccmax=307a & ac_ll=1.1omhm
2b. pl1=253w=pl2 & iccmax=400a & ac_ll=1.1omhm (this is called extreme profile)
so both files are spec by intel, both are OK to be used.
3. actually the same intel doc got lots of rev, and in the rev002 which was released in late 2022, for the same tdp=125w cpu like 13900k, the pl1 was 125w and the pl2 was 188w.
so, three profiles created by intel, but 125 188 one seemed to be cancelled.
4. what igor leaked in 2021 about raptor lake s 125w cpu, that spec turns out to be almost the same as what gigabyte now has implemented (pl1=125w & pl2=188w & iccmax=249a & ac_ll=1.7)
5. what igor leaked in 2021 about raptor lake s 125w cpu, that spec was called "baseline"
6. if you search "baseline" in the latest intel doc, there is no such result that is related to power/voltages/current.
7. why gigabyte set things like that in the baseline profile, maybe the engineer was told to make a intle baseline profile but he didn't know what the fuck was that, so he looked up intel latest doc but found nothing about BASELINE, then he had to look up even earlier, at last he did find out intel baseline, which was the same as what igor leaked back in 2021, so gigabyte copied that and tested that and found out 200a iccmax was way too low, then gigabyte eventually decided to increase the iccmax by themself.
Let me repeat in case it's not clear ~ Intel can force their "board partners" to adhere to their specs in a second, if they wanted to! No pipsqueak would try going against that after Intel's forced their hand.
And we all know the reason they didn't till now :shadedshu:
They consider OC to be changes in the multiplier. So, if motherboard vendors keep the multiplier intact, technically a CPU operates within the spec, no matter how much power they throw at it. This utter mess is a sole responsibility of Intel not willing to define clear power boundaries.
Intel clearly reads on their website that 253W is not set in stone as Maximum Turbo Power can be configurable by OEM.
"The maximum sustained (>1s) power dissipation of the processor as limited by current and/or temperature controls. Instantaneous power may exceed Maximum Turbo Power for short durations (<=10ms). Note: Maximum Turbo Power is configurable by system vendor and can be system specific."
www.intel.com/content/www/us/en/products/sku/236773/intel-core-i9-processor-14900k-36m-cache-up-to-6-00-ghz/specifications.html
- Sustained TCase (IHS temperature) must be < TCaseMax, which for 13/14-gen 125W processors is 61.9 °C.
- TJunction must be always < TJmax (100 °C)
- TCase has a certain thermal inertia. The processor is allowed to exceed base power as long as TCase remains below the spec value. PL2 is intended to take advantage of that thermal inertia to provide a short-term performance boost.
- If the cooling is good enough that Tcase is always below TCaseMax, then PL1 can be increased, or even made equal to PL2.
- The official specs are validated for Intel's own standardized "thermal solution". The minimum spec is that CPUs must be able to sustain base power (e.g. 125W) indefinitely without exceeding TCaseMax.
I wonder if every review will need updated to reflect actual chips that thousands of consumers got that wouldn't run out of the box.
Being complicit in OEM practices during good times and then slapping them when push gets to shove sounds like washing hands, no?
www.intel.com.br/content/www/br/pt/content-details/743844/13th-generation-intel-core-and-intel-core-14th-generation-processors-datasheet-volume-1-of-2.html
For example, the i9 KS chips:
S-Processor 8+16 150 W: PL1 253 W, PL2 253 W
S-Processor 8+16 150 W, Extreme Config: PL1 320 W, PL2 320 W
Or the i5 chips:
S-Processor 6+8 125 W: PL1 125 W, PL2 181 W
The data sheet is concise and complete regarding Tau length, recommended current and wattage for all models and specifications, you just need to know how to correlate the subtype with the marketed name. S-Processor 150 W means i9 KS, S-Processor 125 W means i9 K, etc.
Again, it must be stressed that 13th and 14th Generation Core as well as the Xeon E-2400 series CPUs have the exact same denomination and stepping: "Raptor Lake-S", and they do not have any differences whatsoever between them. They have the exact same stepping and hardware revision, if you compare a i9-13900K and a i9-14900KS the sole difference between them is their clock table and silicon quality, functionally and at a technical level, they are the exact same processor unchanged. There will be no further "generations" to the Core i processor line. The next will be Core Ultra series 2, and it should be radically different compared to the existing Raptor Lake chips.