Monday, April 29th 2024

Intel Statement on Stability Issues: "Motherboard Makers to Blame"

A couple of weeks ago, we reported on NVIDIA directing users of Intel's 13th Generation Raptor Lake and 14th Generation Raptor Lake Refresh CPUs to consult Intel for any issues with system stability. Motherboard makers, by default, often run the CPU outside of Intel's recommended specifications, overvolting the CPU through modifying voltage curves, automatic overclocks, and removing power limits.

Today, we learned that Igor's Lab has obtained a statement from Intel that the company prepared for motherboard OEMs regarding the issues multiple users report. Intel CPUs come pre-programmed with a stock voltage curve. When motherboard makers remove power limits and automatically adjust voltage curves and frequency targets, the CPU can be pushed outside its safe operating range, possibly causing system instability. Intel has set up a dedicated website for users to report their issues and offer support. Manufacturers like GIGABYTE have already issued new BIOS updates for users to achieve maximum stability, which incidentally has recent user reports of still being outside Intel spec, setting PL2 to 188 W, loadlines to 1.7/1.7 and current limit to 249 A. While MSI provided a blog post tutorial for stability. ASUS has published updated BIOS for its motherboards to reflect on this Intel baseline spec as well. Surprisingly, not all the revised BIOS values match up with the Intel Baseline Profile spec for these various new BIOS updates from different vendors. You can read the statement from Intel in the quote below.
Intel has observed that this issue may be related to out of specification operating conditions resulting in sustained high voltage and frequency during periods of elevated heat.

Analysis of affected processors shows some parts experience shifts in minimum operating voltages which may be related to operation outside of Intel specified operating conditions.

While the root cause has not yet been identified, Intel has observed the majority of reports of this issue are from users with unlocked/overclock capable motherboards.

Intel has observed 600/700 Series chipset boards often set BIOS defaults to disable thermal and power delivery safeguards designed to limit processor exposure to sustained periods of high voltage and frequency, for example:
  • Disabling Current Excursion Protection (CEP)
  • Enabling the IccMax Unlimited bit
  • Disabling Thermal Velocity Boost (TVB) and/or Enhanced Thermal Velocity Boost (eTVB)
  • Additional settings which may increase the risk of system instability:
  • Disabling C-states
  • Using Windows Ultimate Performance mode
  • Increasing PL1 and PL2 beyond Intel recommended limits
Intel requests system and motherboard manufacturers to provide end users with a default BIOS profile that matches Intel recommended settings.

Intel strongly recommends customer's default BIOS settings should ensure operation within Intel's recommended settings.

In addition, Intel strongly recommends motherboard manufacturers to implement warnings for end users alerting them to any unlocked or overclocking feature usage.

Intel is continuing to actively investigate this issue to determine the root cause and will provide additional updates as relevant information becomes available.

Intel will be publishing a public statement regarding issue status and Intel recommended BIOS setting recommendations targeted for May 2024.
Source: Igor's Lab
Add your own comment

272 Comments on Intel Statement on Stability Issues: "Motherboard Makers to Blame"

#201
Solid State Brain
CrackongIn Intel spec document it is described that MB vendor should measure and set their own values. (Since no such value is provided by Intel) [...]
It is so because it depends on the impedance of the motherboard's VRM, i.e. on how droopy the voltage regulators are in that specific case. Intel specifies there that the VRM impedance for 125W CPUs must not be higher than 1.1 mOhm, meaning in practice that the voltage must not drop more than 110 mV every 100A of current into the CPU. Less droopy VRMs are in general desirable. The motherboard manufacturer is supposed to know (measure) the electrical characteristics of the VRM it is using and configure that in the firmware (as DC Loadline) so that correct voltage readings are obtained. On none of the relatively recent Intel motherboards I've had so far (10, 11, 12 gen from three different manufacturers) this has ever been the case.

AC Loadline compensates for the Vdroop. If the VRMs have a (maximum Intel spec) impedance of 1.1 mOhm (which, again, should be configured in the DC loadline), then—at least in theory—setting AC Loadline to 1.1 mOhm means that voltage will be corrected upward by 110 mV every 100A into the CPU (practice might differ, possibly due to motherboard/firmware quirks). Again, motherboard manufacturers seemingly tend to use use more-or-less random (or at best, one-fits-all) values for this, and end-users pay the price for it.
Posted on Reply
#202
Crackong
AnonymousGuy767My 14900K wasn't stable at 6Ghz turbo boost. Didn't touch any frequencies or voltages manually, and it would crash at idle when it happened to clock up to 6Ghz because of some background process.

That was a fun one trying to troubleshoot. Eventually I locked it to 5.7 or 5.5 max regardless of core active count and called it a day, and it didn't crash after that. And then it degraded and had to get RMA'd anyways.

So I basically said "f this noise" and put a non-K in. No more TVB to go to unrealistic clocks or weird all core frequency stuff that is beyond specs. And seems to idle at a sane voltage so probably won't degrade. I have the replacement 14900K here so I'm tempted to unseal it and try the new Asus bios to see if they fixed that, but I've done so many CPU remounts and swaps and stuff fixing problems that I'm kinda over it.
Sorry to hear that, it is so much trouble going back & forth for a few hundred MHz and still need to RMA after a relatively short period of time.
Hope you got the better bin CPU and don't have to deal with this anymore.
Solid State BrainAgain, motherboard manufacturers seemingly tend to use use more-or-less random (or at best, one-fits-all) values for this, and end-users pay the price for it.
Agreed.
And from the 'Intel baseline' profiles available right now, we can see that it is still not regulated as it should be (like constant >1.6v from the Gigabyte 'baseline' profile.)
I think Intel should just step in and forced a default ACDC loadline behavour on the guildline. (maybe a forced 1:1 ratio)
Then leave the rest of the playground for MB vendors to doing their own custom profile.

But then it could be a double edged sowrd and MB vendors will be more reluctant on aggressive tuning in future generations.
And future generations CPUs might get bumpy reviews, caz MB vendors now ship their board with the 'default-default' profile.
IDK if Intel really wants it or not.
Posted on Reply
#203
Dr. Dro
AnonymousGuy767My 14900K wasn't stable at 6Ghz turbo boost. Didn't touch any frequencies or voltages manually, and it would crash at idle when it happened to clock up to 6Ghz because of some background process.

That was a fun one trying to troubleshoot. Eventually I locked it to 5.7 or 5.5 max regardless of core active count and called it a day, and it didn't crash after that. And then it degraded and had to get RMA'd anyways.

So I basically said "f this noise" and put a non-K in. No more TVB to go to unrealistic clocks or weird all core frequency stuff that is beyond specs. And seems to idle at a sane voltage so probably won't degrade. I have the replacement 14900K here so I'm tempted to unseal it and try the new Asus bios to see if they fixed that, but I've done so many CPU remounts and swaps and stuff fixing problems that I'm kinda over it.
Interesting, you must have gotten one heck of a dud sample. I've always expected a few 14900K samples to be bad, considered it's really just slapping 13900KS clocks onto a 13900K chip and calling it a day. It relies too much on its binning grade, and we all know not every processor that rolls off the line will meet the same standard of quality. I think installing this new 14900K would work, but perhaps not worth your time if you got a non-K 14900 installed already. Kind of why I didn't bother buying a 14900KS either, I don't think it's worth the effort to seek out and spend the resources to purchase one.

Thankfully I didn't share in your experience, things have been great with my 13900KS.
Posted on Reply
#204
Zubasa
Dr. DroInteresting, you must have gotten one heck of a dud sample. I've always expected a few 14900K samples to be bad, considered it's really just slapping 13900KS clocks onto a 13900K chip and calling it a day. It relies too much on its binning grade, and we all know not every processor that rolls off the line will meet the same standard of quality. I think installing this new 14900K would work, but perhaps not worth your time if you got a non-K 14900 installed already. Kind of why I didn't bother buying a 14900KS either, I don't think it's worth the effort to seek out and spend the resources to purchase one.

Thankfully I didn't share in your experience, things have been great with my 13900KS.
You know, maybe it is because Intel pushed 14th gen which are the exact same silicon even further than 13th gen, thus all the problems.
Posted on Reply
#205
:D:D
Solaris17This is wild. They literally post these specs on the intel ark pages. lol.
Also embedded in a CPU register for those dissing 125W. If not able to read it can use something like CPUZ

valid.x86.fr/cache/screenshot/c6mjl1.png


And IIRC (maybe not) since SandyBridge some processors allowed essentially unlimited power (TDP) settings and many used that without problem.
Crackong



Since there is no reference value provided by Intel.
MB vendors had to cook up their own values, by testing the ES CPUs provided by Intel.
IDK how many they've tested.
But juding from the reality, it doesn't cover the whole silicon lottery spectrum.
Doesn't Intel provide POR?

ES CPU's? The VRTT is a tool that sits in the socket instead of a CPU. Presumably it tests and analyses the mainboard voltage regulator, board traces / power planes and components to get specific values that are dependent on the manufacturers board, so no, Intel cannot provide those details as it's pertinent to the mainboard being used. From these values presumably custom values could be used by the mainboard manufacturer via BIOS to improve performance.

Would Intel locking down TDP stop people running beyond those limits? No, on it's own there are ways to circumvent.

If Intel allow changes that cause problems then perhaps it's partly their problem as well as manufacturers and even users who decide to take components and build their own system rather than buying a prebuilt PC with warranty. JM2c
dgianstefaniWould not be surprised to see them coming out swinging this year.
Not sure if you mean on the end of a rope or not?
Posted on Reply
#206
Dr. Dro
ZubasaYou know, maybe it is because Intel pushed 14th gen which are the exact same silicon even further than 13th gen, thus all the problems.
They're the same chips, and the "issues" are affecting the regular 13900K too. That one doesn't exactly clock into the 6 GHz realm. Also, the 13900KS and 14900K have pretty much the same clock targets. Some of the turbo domains are 100MHz higher on 14900K, clock targets are the same, ergo, 6 GHz TVB.
Posted on Reply
#207
Crackong
:D:DDoesn't Intel provide POR?
Maybe I missed something but I can't find a point of reference listed in that document mentioning a 'Default ' behaviour of the AC/DC loadline.
:D:DES CPU's? The VRTT is a tool that sits in the socket instead of a CPU. Presumably it tests and analyses the mainboard voltage regulator, board traces / power planes and components to get specific values that are dependent on the manufacturers board, so no, Intel cannot provide those details as it's pertinent to the mainboard being used. From these values presumably custom values could be used by the mainboard manufacturer via BIOS to improve performance.
Yes, it depends from baord to baord.
But I think Intel should at least have a specification of a 'Default' behaviour, maybe something like 'In Default, Motherboard BIOS should always had AC/DC loadline calibrated to supply the voltage (+- 20mV max) requested by the CPU' .
They do have a specifiction for Ripples but nothing regulating the deviation on the flat-out supplied voltage.

This seems to be absent right now so every MB vendor had their own trick in their 'Default' profile.
:D:DWould Intel locking down TDP stop people running beyond those limits? No, on it's own there are ways to circumvent.

If Intel allow changes that cause problems then perhaps it's partly their problem as well as manufacturers and even users who decide to take components and build their own system rather than buying a prebuilt PC with warranty. JM2c
Totally.
Posted on Reply
#208
stimpy88
Just for the record, W1zzard deliberately disables the thermal and current protection for these tests
Then the benchmark is invalid. IMO

I also notice that I never got an answer about W1zzard (using or not) the BIOS settings Intel recommends to not mess with...

According to Intel, these settings should be configured...

Enable Current Excursion Protection (CEP)
Disable IccMax Unlimited Bit
Enable Thermal Velocity Boost (TVB)
Enable Enhanced Thermal Velocity Boost (eTVB)
Enable C-States
Posted on Reply
#209
Carlyle2020hs
I don´t like TVB since it always tries to shoot fot the moon.

Not that i trust the numbers in this pic, but the burst captured does produce a heat spike, so something is overdriven:

That´s how TVB looks like for me with a 14600k
Posted on Reply
#210
Dr. Dro
Carlyle2020hsI don´t like TVB since it always tries to shoot fot the moon.

Not that i trust the numbers in this pic, but the burst captured does produce a heat spike, so something is overdriven:

That´s how TVB looks like for me with a 14600k
The 14600K does not support TVB, this is exclusive to the Core i9, not even the i7 supports it.

9.6GHz is definitely a sensor bug. Something here doesn't check out.
Posted on Reply
#211
Carlyle2020hs
Yeah!

Got me pretty high scores though, that i can´t take real credit for:

www.3dmark.com/cpu/1700909

I played around with setting tvb targets (+2 +3 +4), like 4 months ago and stopped since i tuned a silent system and those burst are annoying due to the fans ramping up.

So sorry, no real conclusion here except parts of tvb work even with a 14600k.
Setting 5.7Ghz manually gets me close but without those heat spikes at the beginning of the tests.
Posted on Reply
#212
Dr. Dro
Carlyle2020hsYeah!

Got me pretty high scores though, that i can´t take real credit for:

www.3dmark.com/cpu/1700909

I played around with setting tvb targets (+2 +3 +4), like 4 months ago and stopped since i tuned a silent system and those burst are annoying due to the fans ramping up.

So sorry, no real conclusion here except parts of tvb work even with a 14600k.
Setting 5.7Ghz manually gets me close but without those heat spikes at the beginning of the tests.
So, you have 16 GB of RAM but a 4090? Man. You can't even use that GPU to its full potential... and 5.7 you're pushing your luck for a 14600K anyway. Seems like your rig's got more than a few problems.
Posted on Reply
#213
Carlyle2020hs
It´s all about the usecase.
Mine are old games that don´t use much ram.

And going silent will cost you.
So for that rig i used a dead silent ddr4 board which my other ddr5 ones are not (in all scenarios).
I´ve combined that with an old 16Gb b-die kit that clocks higher than any 12,13 or 14th gen will let me.
And the 4090 can run my games without fans which a 4080 could not.

Playing in 4K i loose about 10% going from a silent, AIO cooled 13900ks to a dead silent, air cooled 14600k.

Thank you for your worry about my rig.
But you don´t have to since it gets into top10 benchmark charts without even trying.

And i technically hold the first place if you filter with 4090s.

No Problems here.
Just something curious that i wanted to share since i don´t truly get what i see on that chart.
So I get a few more points playing around with tvb on a 14600k. But i can´t measure the cost.
Posted on Reply
#214
AleXXX666
dgianstefaniAmazing, thanks :laugh:


What?
no need to go "stable" 11gen, 12 gen is fkin stable and performs double 11gen... even without e-waste cores.
Posted on Reply
#215
chrcoluk
Since this didnt get many eyeballs in the other thread.

community.intel.com/t5/Processors/TjMAX-is-set-to-115-C-by-default/m-p/1430468

The quick version.
ASRock on a Z790 board set TJMAX to 115C as default (this also happened on my board).
Customer queries ASRock, they said its intended behaviour to boost performance.
Customer asked Intel, they said it breaches warranty, out of spec.
ASRock then later told customer they were backing down and future bios would revert to 100C.
Posted on Reply
#216
:D:D
chrcolukASRock on a Z790 board set TJMAX to 115C as default (this also happened on my board).
Well really I would put that one on Intel for allowing it. Started doing that on some chips from 4th/5th gen IIRC
Posted on Reply
#217
chrcoluk
Vya DomusIt's a meme to you maybe, I haven't had any significant issues and I jumped on 7000 series pretty early on.


AMD also has the benefit of typically not lying and that helps a lot, so even if something goes wrong they don't get as much flack for it. Maybe Intel shouldn't have that said in the past that it's actually totally within spec to have a gazillion watt power limit on their CPU, they should have kept their mouths shut and now the narrative that it was the motherboard maker's fault would have been more believable.

Unfortunately they didn't kept their mouths shut and now if you have a brain it's hard to believe they weren't at fault for this.
See link I posted in my previous reply, these claims of them saying its in spec are largely from media. Their tech support isnt telling customers it in spec.

Media reps I wouldnt consider as gospel either, their job is to sell stuff, what support staff and documents says has more meaning. The documents posted in this thread dont state unlimited power as in spec.

Intel are guilty of not policing things properly, but its all gone wild when people are claiming what the board vendors have done is considered spec by Intel. I dont know where the motivation for this is coming from, the hatred for Intel I witness day to day in the tech community or loyalty to the board vendors. Obviously the likes of HUB etc. are just feeding the frenzy with clickbait.
Posted on Reply
#218
Zubasa
chrcolukSee link I posted in my previous reply, these claims of them saying its in spec are largely from media. Their tech support isnt telling customers it in spec.

Media reps I wouldnt consider as gospel either, their job is to sell stuff, what support staff and documents says has more meaning. The documents posted in this thread dont state unlimited power as in spec.

Intel are guilty of not policing things properly, but its all gone wild when people are claiming what the board vendors have done is considered spec by Intel.
TBH customer / tech support can be hit and miss depending who you get as well.
GN did a video a couple years back about Intel CS regarding things like XMP. One CS says its OC and thus not covered by warranty they other didn't even bother asking about XMP being enabled or not.
Intel and AMD has many EULA / Disclamier clause that are usually not enforced but are there so they can weasel out of legal trouble.
Posted on Reply
#219
chrcoluk
:D:DWell really I would put that one on Intel for allowing it. Started doing that on some chips from 4th/5th gen IIRC
This just isnt rationale thinking. Board vendor writes the BIOS and provides the board, they even told the customer they did it for performance.
ZubasaTBH customer / tech support can be hit and miss depending who you get as well.
GN did a video a couple years back about Intel CS regarding things like XMP. One CS says its OC and thus not covered by warranty they other didn't even bother asking about XMP being enabled or not.
Intel and AMD has many EULA / Disclamier clause that are usually not enforced but are there so they can weasel out of legal trouble.
Sometimes companies or operators give benefit of doubt or go beyond what is expected, like when WD accepted RMA from a failed HDD for me when it was out of warranty.

Did GN disclose the XMP voluntarily to the rep who didnt ask?
Posted on Reply
#220
:D:D
chrcolukThis just isnt rationale thinking. Board vendor writes the BIOS and provides the board, they even told the customer they did it for performance.
Try to see the bigger picture.

What isn't rational thinking is saying Intel allowed changing Tjmax to 115C because they didn't want anyone using more than 100C!

Many Intel CPU's do not allow changing Tjmax (which is really Tjtarget), if anything only an offset to allow throttling at a lower temp up to about 15C lower.

Board vendors do not write BIOS, BIOS companies do usually with some Intel reference code and use of the "BIOS Writer's Guide" from Intel. The manufactures might customize the BIOS provided or ask for customization.

And IIRC Gigabyte did a similar thing with IMO stupidly high Tjmax. Again, if Intel didn't want to allow this then it would have been locked down as usual.
Posted on Reply
#223
BoggledBeagle
kmdkai on the chinese forum wrote, that the unstable chip may fail only after a few days of his testing and that his findings are not very relevant for a "casual user".

砸钱试稳13900K14900K。intel的BIOS baseline设定完全没解决任何问题 - 电脑讨论(新) - Chiphell - 分享与交流用户体验
I often buy processors because I have a big studio with dozens of machines, and I occasionally install them and sell them. So starting from the 13th generation, 13900K started to be used, and 14900K was released to replace 14900K in batches. So far, at least more than 100 14900K have been passed through. Step by step from 13900K to 14900K, it’s hard to describe in one sentence. In one sentence, it can be summed up as follows: It is already enough and I need to pull more. In addition, I have bought a lot of both boxed and loose pieces. By the way, I can tell you my feelings about the boxed and loose pieces. Mainly the main motherboards are Z690 B660 and Z790 B760, used from 6 series to 7 series. Not to mention the BIOS. I have to go to the ASUS official website once a week to see if the BIOS update can solve the problem. Before I begin to summarize, let me first talk about my standards. [Fully defaulted and safe to use] The productivity software runs continuously for a week without freezing, crashing, restarting, or reporting errors. (Productivity software refers to various data calculation categories, including but not limited to matlab, R, finite element, VC, VS, large-scale office batch processing, etc.) Yes, it is this productivity stability requirement that PASS has lost many Us. First, set the Z board to full AUTO and default to no overclocking. The power consumption is 253W. If it fails, go to the B board. If it still fails, it will be determined to be unstable. Let’s summarize: 1. The probability that 13900K can be fully defaulted and used safely is about 40% to 50% (at most 4 to 5 out of 10 can be fully defaulted and used safely). Improving the anti-drop pressure can increase the probability by about 10% to 20%. Setting the voltage to worst on the Z board or going to the B board can increase the probability by 20%. The probability that 13900KS can be used safely by default is between 13900K and 14900K. 2. The probability that 14900K can be fully defaulted and used safely is about 20% (at most 2 out of 10 can be fully defaulted and used safely). Improving the anti-drop pressure can increase the probability by less than 10%. One special thing about 14900K is that it is basically unsafe to use on the Z board, even if the failsafe voltage mode is set. So I mainly use B boards. 3. The power consumption and temperature of loose chips under the same load are generally about 10~20% higher than that of boxed chips, so it is easier to trigger the power wall or temperature wall. 4. The stability of loose chips is slightly better than that of boxed ones (this conclusion is only responsible for brand new loose chips. Second-hand loose chips are most likely to have been picked up even if there are no indentations). The slightly higher voltage of loose chips may be a life-saving straw. But what's the use of easily decelerating at 100 degrees (the most outrageous 13900K chip I've ever seen has a 180W power consumption that reaches 100 degrees of thermal deceleration). 13900K loose chips are easier to stabilize, while 14900K loose chips are useless and completely slag. 5. Intel recently added baseline settings. I tested it immediately, and the result was: completely useless. It is just equivalent to helping you limit the failsafe voltage (Z board), default CEP, current wall, and power consumption wall. Doesn't help improve stability. After Z board adopts this setting, it is equivalent to becoming B board. 6. Before the baseline option was available, the stability of board B was better than that of board Z. However, the stability of Z board after adopting the baseline option is no different from that of B board. 7. I have only bought 3 pieces of 14900KS so far. It is too IQ taxing, so I will not comment on 14900KS for the time being. I will make a separate post later. 8. In particular, once I accidentally set the power consumption wall to 33 watts and found that the energy consumption ratio was satisfactory. It is strongly recommended that you simulate the low power consumption situation, there will be surprises. 9. Regarding the voltage, I have tried Intel failsafe on the Z board before. If you choose this voltage mode, the motherboard will indirectly become the B board. 10. Supplement: I have bought dozens of 13700K before, and all of them are as stable as old dogs, and there is no problem with any of them. Therefore, I personally think that if you want to close your eyes and use it with peace of mind, the 13700K is the most stable. 14700K is not as cost-effective as 13900K, so I have never been exposed to 14700K and cannot comment on 14700K. Conclusion analysis: The above is just my personal experience as a productivity user, which is equivalent to the conclusion that I spent money to try for everyone. Pressurizing is definitely the right answer, but how to increase it to a stable point, especially when long-term testing is required for productivity, is too energy-consuming. Sometimes the test may be fine for 2 days, but if there is a problem on the third day, it is very time-consuming. . So every time I encounter something that fails to test, I resell it directly. If you are just an average user, there is no need to care about my extreme productivity conclusion. It can be relaxed a lot. After all, ordinary users can still tolerate unstable situations that are not easy to relapse. For example, my own personal computer, the 14900K+Z version, by default only encounters a crash and restart [doge] once a month at most.
Posted on Reply
#224
Dr. Dro
BoggledBeaglekmdkai on the chinese forum wrote, that the unstable chip mail fail after a few days of his testing and that his findings are not very relevant for a "casual user".
It's certainly affecting gamers, though. I've noticed some correlation between the Nvidia driver acting up and the CPU being about to crash. Still, I haven't experienced issues since updating my motherboard's BIOS and triple checking all settings on my BIOS. We'll see, this seems to be a chronic/recurrent issue that will need Intel's intervention long-term.
Posted on Reply
#225
BoggledBeagle
He mentioned that all 13700K he tested were stable. They have maximal frequencies 5400/4200 MHz.

I think I may increase limits for my 14900K from 5200/4200 MHz to 5300/4200 without any fear.
Posted on Reply
Add your own comment
Jun 1st, 2024 09:51 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts