Monday, April 29th 2024

Intel Statement on Stability Issues: "Motherboard Makers to Blame"

A couple of weeks ago, we reported on NVIDIA directing users of Intel's 13th Generation Raptor Lake and 14th Generation Raptor Lake Refresh CPUs to consult Intel for any issues with system stability. Motherboard makers, by default, often run the CPU outside of Intel's recommended specifications, overvolting the CPU through modifying voltage curves, automatic overclocks, and removing power limits.

Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Intel has observed that this issue may be related to out of specification operating conditions resulting in sustained high voltage and frequency during periods of elevated heat.

Analysis of affected processors shows some parts experience shifts in minimum operating voltages which may be related to operation outside of Intel specified operating conditions.

While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards.

Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:
  • Disabling Current Excursion Protection (CEP)
  • Enabling the IccMax Unlimited bit
  • Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
  • Additional settings which may increase the risk of system instability:
  • Disabling C-states
  • Using Windows Ultimate Performance mode
  • Increasing PL1 and PL2 beyond Intel recommended limits
Intel requests system and motherboard manufacturers to provide end users with a default BIOS profile that matches Intel recommended settings.

Intel strongly recommends customer's default BIOS settings should ensure operation within Intel's recommended settings.

In addition, Intel strongly recommends motherboard manufacturers to implement warnings for end users alerting them to any unlocked or overclocking feature usage.

Intel is continuing to actively investigate this issue to determine the root cause and will provide additional updates as relevant information becomes available.

Intel will be publishing a public statement regarding issue status and Intel recommended BIOS setting recommendations targeted for May 2024.
Source: Igor's Lab
Add your own comment

272 Comments on Intel Statement on Stability Issues: "Motherboard Makers to Blame"

#126
BoggledBeagle
BTW the fact itself that Intel acknowledged the problem and is preparing something is a proof, that they are getting a lot of degraded / unstable chips back. They would happily ignore the problem if it was not biting them badly.
Posted on Reply
#127
londiste
CrackongAre you sure?

Since 14900KS had a PL1/PL2 = 150/320, which is differ from regular 14900K's 125/253
If they had the same baseline profile, it will render them basically the same SKU.
This is a bit of spec bullshit from Intel. ARK page for CPUs has Processor Base Power which is not PL1. Maximum Turbo Power is reasonably enough PL2.
The spec values for PL1/PL2 are in the Datasheet.
14900K Extreme profile and 14900K non-Extreme are the same. Extreme profile is not and should not be applicable for Base Profile.
AssimilatorThe fact that Intel's CPU power delivery specification is so convoluted, with so many knobs and dials, would reasonably suggest a pressing need for Intel to carefully validate any firmware that board partners release, in order to prevent blown up CPUs.
Oh sweet summer child. What makes you think only Intel's power delivery has many knobs and dials? :D
Posted on Reply
#128
Dr. Dro
BoggledBeagleBTW the fact itself that Intel acknowledged the problem and is preparing something is a proof, that they are getting a lot of degraded / unstable chips back. They would happily ignore the problem if it was not biting them badly.
The 13900K itself is barely over a year old. I have owned my 13900KS for a little over 370 days myself, I find it rather unlikely that these chips would degrade so fast, unless subjected to 110+C and 350+W constantly and even then. Sensationalist "news" articles tend to cause panic and exacerbate situations, which tend to reflect on general consumer mood and propensity to return "faulty" systems.

I have good reason to believe that I was affected by said stability problems (at least with the last 1F BIOS, and there's some likelihood it was my fault as well), since updating to the 1G BIOS I made sure I triple checked every single one of my settings and memory subtimings, and I have not experienced any BSOD's since. Something funny though, whenever my computer was about to crash, the scrambled graphics bug on the Nvidia drivers that affected Chromium would trigger something fierce.
CrackongIt is Intel's CPU, it is their job to make sure the motherboard vendors having a correct 'Default' profile so it works 100% of the time.
This lack of communication alone is a big issue and is one of the Intel biggest fault.

And, if your CPUs are this fragile, measurements should be taken to 'prevent' the partners further messing it up.
Like AMD, with their X3D voltage issue, they forced new voltage setting very quickly and RMA every affected case.
Like Nvidia, Nvidia does a great job make sure the AIB cannot mess up their GPUs and, if something's up like the 12vhpwr issue, Nvida took the responsibility and took care every affected case.

Please noted in the above mentioned cases,
Although customers do blame the AIB partners,
But AMD/Nvidia themselves didn't actively placed the blame on their partners.
They just went in, solved the problem, and get out ASAP.

if they can do it, why not Intel ?
Ultimately this is my sole problem with Intel's response. They should be far more proactive and shield themselves less from any potential blame. But then again, if you look at who's running Intel's PR at the moment, you're going to understand this stance. It's the same one AMD took until very, very recently ;)
Posted on Reply
#129
Robin Seina
Compare current statement with Intel interview with Anandtech in 2019:

Ian Cutress: One of the things we’ve seen with the parts that we review is that we’re taking consumer or workstation level motherboards from the likes of ASUS, ASRock, and such, and they are implementing their own values for that PL2 limit and also the turbo window – they might be pushing these values up until the maximum they can go, such as a (maximum) limit of 999 W for 4096 seconds. From your opinion, does this distort how we do reviews because it necessarily means that they are running out of Intel defined spec?


Guy Therien:
Even with those values, you're not running out of spec, I want to make very clear – you’re running in spec, but you are getting higher turbo duration.
We’re going to be very crisp in our definition of what the difference between in-spec and out-of-spec is. There is an overclocking 'bit'/flag on our processors. Any change that requires you to set that overclocking bit to enable overclocking is considered out-of-spec operation. So if the motherboard manufacturer leaves a processor with its regular turbo values, but states that the power limit is 999W, that does not require a change in the overclocking bit, so it is in-spec.

Source: www.anandtech.com/show/14582/talking-tdp-turbo-and-overclocking-an-interview-with-intel-fellow-guy-therien
Posted on Reply
#130
londiste
Dr. DroThe 13900K itself is barely over a year old. I have owned my 13900KS for a little over 370 days myself, I find it rather unlikely that these chips would degrade so fast, unless subjected to 110+C and 350+W constantly and even then. Sensationalist "news" articles tend to cause panic and exacerbate situations, which tend to reflect on general consumer mood and propensity to return "faulty" systems.
I do not think the chips degrade. There is a (relatively) new use case that is very picky about CPU stability. I got the same errors - like running out of VRAM in Unreal Engine games when there was clearly enough VRAM - but on a 7800X3D. Took a moment and the news about these being caused by CPUs before I went and changed the Curve Optimizer to a bit less than -30 on all cores. Nothing else before this had been failing or even given any indications of failing.
Posted on Reply
#131
dgianstefani
TPU Proofreader
Robin SeinaCompare current statement with Intel interview with Anandtech in 2019:

Ian Cutress: One of the things we’ve seen with the parts that we review is that we’re taking consumer or workstation level motherboards from the likes of ASUS, ASRock, and such, and they are implementing their own values for that PL2 limit and also the turbo window – they might be pushing these values up until the maximum they can go, such as a (maximum) limit of 999 W for 4096 seconds. From your opinion, does this distort how we do reviews because it necessarily means that they are running out of Intel defined spec?


Guy Therien:
Even with those values, you're not running out of spec, I want to make very clear – you’re running in spec, but you are getting higher turbo duration.
We’re going to be very crisp in our definition of what the difference between in-spec and out-of-spec is. There is an overclocking 'bit'/flag on our processors. Any change that requires you to set that overclocking bit to enable overclocking is considered out-of-spec operation. So if the motherboard manufacturer leaves a processor with its regular turbo values, but states that the power limit is 999W, that does not require a change in the overclocking bit, so it is in-spec.

Source: www.anandtech.com/show/14582/talking-tdp-turbo-and-overclocking-an-interview-with-intel-fellow-guy-therien
If all mobo makers did was change PL values, the CPU would still have all of the safeguards and algorithms to ensure stability.

The issue is that the board partners also disable many of the Intel boost algorithms, C states etc.
Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:
  • Disabling Current Excursion Protection (CEP)
  • Enabling the IccMax Unlimited bit
  • Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
  • Additional settings which may increase the risk of system instability:
  • Disabling C-states
  • Using Windows Ultimate Performance mode
  • Increasing PL1 and PL2 beyond Intel recommended limits
This one change doesn't change the voltage curve much, i've tested it on many builds I have done, hence falling under the "additional settings that may increase the risk" category.
These settings are quite significant -
  • Disabling Current Excursion Protection (CEP)
  • Enabling the IccMax Unlimited bit
  • Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
The crashes are from voltage going too high or too low. This can also be exacerbated by VRMs failing to keep up with boost clock changes, and over/undershooting voltage targets.

Modifying the target frequency (MCE all core boost etc and other names for this), voltage LLC, and other settings will adjust the voltage curve outside of spec.
Posted on Reply
#132
Robin Seina
dgianstefaniIf all mobo makers did was change PL values, the CPU would still have all of the safeguards and algorithms to ensure stability.

The issue is that the board partners also disable many of the Intel boost algorithms, C states etc.


This one change doesn't change the voltage curve much, i've tested it on many builds I have done, hence falling under the "additional settings that may increase the risk" category.
These settings are quite significant -
  • Disabling Current Excursion Protection (CEP)
  • Enabling the IccMax Unlimited bit
  • Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
The crashes are from voltage going too high or too low. This can also be exacerbated by VRMs failing to keep up with boost clock changes, and over/undershooting voltage targets.
Well, Intel certainly knew about and encouraged this MB vendor behaviour for at least last 6 years. It gave him competitive and marketing advantage, after all. Now, the company is shifting all the blame on said MB vendors.
Posted on Reply
#133
chrcoluk
Part 2 of the debacle, Asus baseline.

Posted on Reply
#134
Crackong
chrcolukPart 2 of the debacle, Asus baseline.

Thanks, watching right now.

2 minutes in and Buildzoid already shown something interesting, the optimized default of ASUS APEX board BIOS 1202 was PL1/PL2 = 253/4095
That's is like..insane and it consumes 360W in R15 and crash R15

The baseline default on the APEX seems a lot resonable than Gigabyte, Buildzoid's 14900k runs 10% faster than the Gigabyte's baseline default.
Posted on Reply
#135
chrcoluk
So summary of my thoughts based on both documents.

i9 CPUs listed as baseline power of 150w (which seems to be the KS models), have 253/253 as perf spec. Extreme config of 320/320.
8+8 and 8+16 models which have 125w listed as base power have 253/125 as perf spec. But also have an extreme config as 253/253.
Baseline spec seems to be 188/125.

Document here, originally posted by @dgianstefani

www.intel.com/content/www/us/en/content-details/743844/13th-generation-intel-core-and-intel-core-14th-generation-processors-datasheet-volume-1-of-2.html
Posted on Reply
#136
mkppo
dgianstefaniThey do just say it.

Have a look at the datasheet.

What other companies do with Intel products is up to them.

If I buy a car and tune the engine until it explodes, is this the fault of the car manufacturer?
No, but if the car manufacturer says this car tuned to 500BHP with a stage one turbo kit is "within spec" and later dies slowly it's absolutely on the manufacturer. That's exactly what happened here, as evident by the response they provided to Ian Cuttress when talking about consumer and professional boards (not just workstation mind you, please read it carefully or see the clip).

Good on Hardware Unboxed to lay it down on intel and they absolutely deserve it. It's a fact that they knew exactly what the board makers were doing, and did nothing about it but rather encouraged the behaviour. I remember some of the senior reviewers a few years back were pretty confused about the whole power limits thing, the ambiguity and lack of action on intel's part. I know for a fact Ian was, and there were some podcasts where they spoke about it in more detail and people even reached out to intel but they had absolutely zero issues with board partners and everything was 'within spec'.

You can't really defend intel here and just say "oh they should have enforced the board manufacturers more that's their only fault". That's really not the only fault there is it. When people started commenting on other issues, you admitted that the PL1=PL2 was not defined properly earlier and recently resurfaced. There are many more cracks in Intel's spec, and lots of ambiguity and looseness in their 'guide' which is an abhorrent mess. Most of it is covered by HWU.

Just FYI, there's no speculation on Steve's part and I 100% agree with him.
Posted on Reply
#137
chrcoluk
CrackongThanks, watching right now.

2 minutes in and Buildzoid already shown something interesting, the optimized default of ASUS APEX board BIOS 1202 was PL1/PL2 = 253/4095
That's is like..insane and it consumes 360W in R15 and crash R15

The baseline default on the APEX seems a lot resonable than Gigabyte, Buildzoid's 14900k runs 10% faster than the Gigabyte's baseline default.
Yep its a mess.

He also raises a point which is bad news, previously if you had a lottery loser chip that wasnt stable out of the box you would probably get a new one with RMA, now they might refuse swap if its stable on baseline.
Posted on Reply
#138
Darmok N Jalad
I see it rather simply. Intel should have a safe baseline as the platform default, and board partners should be held to that as the first boot default, or no license. If board partners want to run wild on power specs, fine, but issue the appropriate warning before it gets enabled. It’s what any system builder should expect, honestly.

I simply don’t believe that Intel didn’t know this wasn’t happening all along. They sure seem to have opened the door for this and just left themselves an out for when this finally came home to roost. They got a few generations of better-looking benchmarks out of the deal.

Intel just needs to take ownership of not enforcing proper defaults, but that will only cost them in the long run if they continue down their current design path of allowing for insane power limits. I remember when this all started, where Intel CPUs lacked in performance at default settings, but took off once you unlocked the power limits. It seems it only took one generation for those defaults to start getting ignored.
Posted on Reply
#139
Crackong
chrcoluknow they might refuse swap if its stable on baseline.
Adding insult to injury, even if a user find it unstable with baseline, it could be on the difference between vendor's baseline profiles.
RMA the CPU and the tester test it on a Gigabyte.....
Now that user get stuck with a 'No problem' CPU which doesn't work on his/her MB on hand...
Posted on Reply
#140
asdkj1740
1. intel document does state there are two profiles for 13900k/13900kf/14900k/14900kf which is rated at tdp 125w.
2. the profiles in the doc are,
2a. pl1=125w & pl2=253w & iccmax=307a & ac_ll=1.1omhm
2b. pl1=253w=pl2 & iccmax=400a & ac_ll=1.1omhm (this is called extreme profile)
so both files are spec by intel, both are OK to be used.
3. actually the same intel doc got lots of rev, and in the rev002 which was released in late 2022, for the same tdp=125w cpu like 13900k, the pl1 was 125w and the pl2 was 188w.
so, three profiles created by intel, but 125 188 one seemed to be cancelled.
4. what igor leaked in 2021 about raptor lake s 125w cpu, that spec turns out to be almost the same as what gigabyte now has implemented (pl1=125w & pl2=188w & iccmax=249a & ac_ll=1.7)
5. what igor leaked in 2021 about raptor lake s 125w cpu, that spec was called "baseline"
6. if you search "baseline" in the latest intel doc, there is no such result that is related to power/voltages/current.
7. why gigabyte set things like that in the baseline profile, maybe the engineer was told to make a intle baseline profile but he didn't know what the fuck was that, so he looked up intel latest doc but found nothing about BASELINE, then he had to look up even earlier, at last he did find out intel baseline, which was the same as what igor leaked back in 2021, so gigabyte copied that and tested that and found out 200a iccmax was way too low, then gigabyte eventually decided to increase the iccmax by themself.

Posted on Reply
#141
R0H1T
Dr. DroUltimately this is my sole problem with Intel's response. They should be far more proactive and shield themselves less from any potential blame. But then again, if you look at who's running Intel's PR at the moment, you're going to understand this stance. It's the same one AMD took until very, very recently ;)
I mean I can't stress this enough ~ Intel has pushed BIOS updates & ucode updates blocking free OCing over at least 5 gen of motherboards, going as far as a year(?) down the line after the products were sold to block them off! Anyone defending Intel over this is either totally ignorant of this fact or just (short of) a paid shill :rolleyes:

Let me repeat in case it's not clear ~ Intel can force their "board partners" to adhere to their specs in a second, if they wanted to! No pipsqueak would try going against that after Intel's forced their hand.

And we all know the reason they didn't till now :shadedshu:
Posted on Reply
#142
Tek-Check
dgianstefaniThe processors are stable if configured as advertised (PL1=PL2), the rest to spec.
Everything else is mobo maker deviation.
It's way more messy. Massive power unlocking is within the spec, according to what Intel officially said in interview with Dr Cutress and according to additiona information on their website, one click away under 253W.

They consider OC to be changes in the multiplier. So, if motherboard vendors keep the multiplier intact, technically a CPU operates within the spec, no matter how much power they throw at it. This utter mess is a sole responsibility of Intel not willing to define clear power boundaries.

Intel clearly reads on their website that 253W is not set in stone as Maximum Turbo Power can be configurable by OEM.
"The maximum sustained (>1s) power dissipation of the processor as limited by current and/or temperature controls. Instantaneous power may exceed Maximum Turbo Power for short durations (<=10ms). Note: Maximum Turbo Power is configurable by system vendor and can be system specific."

www.intel.com/content/www/us/en/products/sku/236773/intel-core-i9-processor-14900k-36m-cache-up-to-6-00-ghz/specifications.html

Posted on Reply
#143
Solid State Brain
Tek-CheckThey consider OC to be changes in the multiplier. So, if motherboard vendors keep the multiplier intact, technically a CPU operates within the spec, no matter how much power they throw at it. This utter mess is a sole responsibiluty of Intel not willing to define clear power boundaries.
The specs are more or less clear.
- Sustained TCase (IHS temperature) must be < TCaseMax, which for 13/14-gen 125W processors is 61.9 °C.
- TJunction must be always < TJmax (100 °C)
- TCase has a certain thermal inertia. The processor is allowed to exceed base power as long as TCase remains below the spec value. PL2 is intended to take advantage of that thermal inertia to provide a short-term performance boost.
- If the cooling is good enough that Tcase is always below TCaseMax, then PL1 can be increased, or even made equal to PL2.
- The official specs are validated for Intel's own standardized "thermal solution". The minimum spec is that CPUs must be able to sustain base power (e.g. 125W) indefinitely without exceeding TCaseMax.
Posted on Reply
#144
Dr. Dro
Tek-CheckIntel clearly reads on their website that 253W is not set in stone as Maximum Turbo Power can be configurable by OEM.
"The maximum sustained (>1s) power dissipation of the processor as limited by current and/or temperature controls. Instantaneous power may exceed Maximum Turbo Power for short durations (<=10ms). Note: Maximum Turbo Power is configurable by system vendor and can be system specific."

www.intel.com/content/www/us/en/products/sku/236773/intel-core-i9-processor-14900k-36m-cache-up-to-6-00-ghz/specifications.html

That just means it's unlocked. And there goes with the "sensationalism" i was going on about. Guess even Hardware Unboxed needs their views every now and then.
Posted on Reply
#145
remixedcat
R0H1TI mean I can't stress this enough ~ Intel has pushed BIOS updates & ucode updates blocking free OCing over at least 5 gen of motherboards, going as far as a year(?) down the line after the products were sold to block them off! Anyone defending Intel over this is either totally ignorant of this fact or just (short of) a paid shill :rolleyes:

Let me repeat in case it's not clear ~ Intel can force their "board partners" to adhere to their specs in a second, if they wanted to! No pipsqueak would try going against that after Intel's forced their hand.

And we all know the reason they didn't till now :shadedshu:
didn't they also block under/overvolting on core ultra?? It seems they are, indeed getting more restrictive. You need that for the thin n light laptops.
Posted on Reply
#146
Steevo
So, all that sweet sweet 1-3% on some games for 3X the wattage was out of spec? How very sus of Intel, imagine how much heat AMD would get for BIOS issues.....


I wonder if every review will need updated to reflect actual chips that thousands of consumers got that wouldn't run out of the box.
Posted on Reply
#147
Tek-Check
Dr. DroThat just means it's unlocked. And there goes with the "sensationalism" i was going on about. Guess even Hardware Unboxed needs their views every now and then.
It's interesting, isn't it. Intel allows OEMs to run unlocked profiles, everything is good when benchmarks are high and then blames them in an official preliminary statement when CPUs degrade over time and cause instability.

Being complicit in OEM practices during good times and then slapping them when push gets to shove sounds like washing hands, no?
Posted on Reply
#148
dir_d
Clearly there is a problem if there is 6 pages of knowledgeable people bickering about what is the "Baseline". All we can do is wait for Intel's "Official" document regarding this in May. It is fun to try to figure this out ourselves though.
Posted on Reply
#149
matar
I am glad I did not go this route and just a month ago i was playing to buy a z790 with a 13900kf for a great deal but i said no i will pass because of the heating issues and then i will need a better AIO and a better case so wasn't worth it so i am waiting on the new 15gen and maybe till 16gen will see i am very happy with my i9-10900KF 24/7 @5.1GHZ all core and Ring @4.6ghz @1.28v
Posted on Reply
#150
Dr. Dro
dir_dClearly there is a problem if there is 6 pages of knowledgeable people bickering about what is the "Baseline". All we can do is wait for Intel's "Official" document regarding this in May. It is fun to try to figure this out ourselves though.
The baseline is already documented in the whitepaper. The Intel-recommended values are stated in section 4.4, page 98 of the data sheet.

www.intel.com.br/content/www/br/pt/content-details/743844/13th-generation-intel-core-and-intel-core-14th-generation-processors-datasheet-volume-1-of-2.html

For example, the i9 KS chips:

S-Processor 8+16 150 W: PL1 253 W, PL2 253 W
S-Processor 8+16 150 W, Extreme Config: PL1 320 W, PL2 320 W

Or the i5 chips:

S-Processor 6+8 125 W: PL1 125 W, PL2 181 W

The data sheet is concise and complete regarding Tau length, recommended current and wattage for all models and specifications, you just need to know how to correlate the subtype with the marketed name. S-Processor 150 W means i9 KS, S-Processor 125 W means i9 K, etc.

Again, it must be stressed that 13th and 14th Generation Core as well as the Xeon E-2400 series CPUs have the exact same denomination and stepping: "Raptor Lake-S", and they do not have any differences whatsoever between them. They have the exact same stepping and hardware revision, if you compare a i9-13900K and a i9-14900KS the sole difference between them is their clock table and silicon quality, functionally and at a technical level, they are the exact same processor unchanged.
matarI am glad I did not go this route and just a month ago i was playing to buy a z790 with a 13900kf for a great deal but i said no i will pass because of the heating issues and then i will need a better AIO and a better case so wasn't worth it so i am waiting on the new 15gen and maybe till 16gen will see i am very happy with my i9-10900KF 24/7 @5.1GHZ all core and Ring @4.6ghz @1.28v
There will be no further "generations" to the Core i processor line. The next will be Core Ultra series 2, and it should be radically different compared to the existing Raptor Lake chips.
Posted on Reply
Add your own comment
Jun 1st, 2024 06:57 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts