Thursday, September 26th 2024
Intel Isolates "Raptor Lake" Vmin Shift Instability Root Cause, New Microcode Update Coming
Back in August, Intel started shipping its 0x129 microcode update for 13/14th generation "Raptor Lake" and "Raptor Lake Refresh" processors. This update fixed incorrect voltage requests to the processor that are causing elevated operating voltage. Intel's analysis showed that the root cause of stability problems is voltage levels that are too high during the operation of the processors. These increases in voltage cause degradation that increases the minimum voltage required for stable operation. Intel calls this "Vmin." Today, the company discovered the root cause of this instability issue and informed users that a new microcode patch is underway. As explained by Intel, the Vmin Shift instability problem stems from a clock tree circuit in the IA core. When exposed to high voltage and temperature conditions, this circuit is vulnerable to reliability degradation. Intel's research has shown that these factors can cause a shift in the duty cycle of the clocks, resulting in system instability.
There are four scenarios that can cause Vmin Shift: increased motherboard power delivery, eTVB microcode algorithm running at higher performance operating states even at higher temperatures, microcode SVID algorithm requesting higher voltages at higher frequencies and longer durations, and finally microcode and BIOS requesting elevated core voltages. For motherboard power settings, mitigation is switching back to default settings. For the eTVB issue, the fix is a 0x125 microcode update. The 0x129 patch fixes the SVID algorithm, and the fourth condition, where microcode and BIOS request elevated core voltage, is fixed by the upcoming 0x12B microcode update. Intel is reportedly working with OEMs to start rolling out the 0x12B update with no apparent performance degradation. While the timeframe for shipping this update is unknown, we expect to see it soon. Additionally, Intel once again confirmed that the upcoming "Arrow Lake" CPUs don't have these issues.
Source:
Intel
There are four scenarios that can cause Vmin Shift: increased motherboard power delivery, eTVB microcode algorithm running at higher performance operating states even at higher temperatures, microcode SVID algorithm requesting higher voltages at higher frequencies and longer durations, and finally microcode and BIOS requesting elevated core voltages. For motherboard power settings, mitigation is switching back to default settings. For the eTVB issue, the fix is a 0x125 microcode update. The 0x129 patch fixes the SVID algorithm, and the fourth condition, where microcode and BIOS request elevated core voltage, is fixed by the upcoming 0x12B microcode update. Intel is reportedly working with OEMs to start rolling out the 0x12B update with no apparent performance degradation. While the timeframe for shipping this update is unknown, we expect to see it soon. Additionally, Intel once again confirmed that the upcoming "Arrow Lake" CPUs don't have these issues.
46 Comments on Intel Isolates "Raptor Lake" Vmin Shift Instability Root Cause, New Microcode Update Coming
However, the damage it's done. Intel had a very good reputation about hardware degradation over the years, and maybe some users will think in going with AMD after this, if AMD it's still competitive in some years.
The real reason for quick degradation - too high frequency, which is the underlying cause for the elevated temperature and voltage causing high electric current density, is missing from their list of causes.
I am not convinced that even a brand new CPU running the 12B microcode will reliably work for long years at those extreme frequencies.
You have no idea how many such users exist, they are known as "whales".
During the ETH mining craze, a US crypto-miner mounted an array of 3090's and 3080's at the trunk his $200K car and then posted the picture on twitter just to spite ppl like you. That was back when there were 200 ppl queues at Cali Microcenter stores and ppl were camping out in Microcenter parking lots two days in advance to have a shot at a 3080.
I can ramp up frequency of my CPU to 7GHZ with same voltage I run it at 4.7, maybe it will even boot, but then it will crash due to instability. That does not mean that the chip got damaged.
The root cause for the mess are Intel execs who chose to ignore all the good industry practices and precautionary principles, which are in place to deliver customers a long term reliable product.
Quality delivered as my (whataboutism) INTEL WLAN card in gnu linux (which crashes since i bought it // determine the crashes - rule it out -> usb tethering instead as of the intel wlan chip -> no crash since than).
Best bet is to run a p-core clock that allows lower voltages and don't run the crazy high voltage that results from the default profile's high ACLL unless your CPU is just not stable any other way.
Edit: Oh and the simple answer presented by other commenters is the best answer…Intel ran their chips at too high of frequency.
I also believe it is not correct (fair) to torture the CPUs at higher than stock speeds and with substandard cooling.