• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Statement on 13th and 14th Gen Core Instability: Faulty Microcode Causes Excessive Voltages, Fix Out Soon

Joined
Apr 15, 2020
Messages
409 (0.24/day)
System Name Old friend
Processor 3550 Ivy Bridge x 39.0 Multiplier
Memory 2x8GB 2400 RipjawsX
Video Card(s) 1070 Gaming X
Storage BX100 500GB
Display(s) 27" QHD VA Curved @120Hz
Power Supply Platinum 650W
Mouse Light² 200
Keyboard G610 Red
Intel need to learn and improve or there will be stagnation all over again.
 
Joined
Sep 24, 2020
Messages
145 (0.09/day)
System Name Room Heater Pro
Processor i9-13900KF
Motherboard ASUS ROG STRIX Z790-F GAMING WIFI
Cooling Corsair iCUE H170i ELITE CAPELLIX 420mm
Memory Corsair Vengeance Std PMIC, XMP 3.0 Black Heat spreader, 64GB (2x32GB), DDR5, 6600MT/s, CL 32, RGB
Video Card(s) Palit GeForce RTX 4090 GameRock OC 24GB
Storage Kingston FURY Renegade Gen.4, 4TB, NVMe, M.2.
Display(s) ASUS ROG Swift OLED PG48UQ, 47.5", 4K, OLED, 138Hz, 0.1 ms, G-SYNC
Case Thermaltake View 51 TG ARGB
Power Supply Asus ROG Thor, 1200W Platinum
Mouse Logitech Pro X Superlight 2
Keyboard Logitech G213 RGB
VR HMD Oculus Quest 2
Software Windows 11 23H2
Well I'm not Intel so I can't answer this on their behalf, although I also wouldn't mind them being sued for $100 billion for this.
What I was getting at is that those "dumb" people have trusted Intel to make a product that doesn't self-destruct when running within specs. Specs that don't include an asterisk: "don't run it at 6GHz for more than 5 minutes / day".
 
Joined
Apr 12, 2013
Messages
7,536 (1.77/day)
And that's why it was dumb ~ you're running a business & you don't know "safe" limits for these chips? Even if we do accept some of them running well within safe temps/clocks it was still a slight risk.

Remember Turbo specs/clocks aren't guaranteed & technically it's still OCing ~ Intel/AMD only guarantee base clocks!
 
Joined
Sep 24, 2020
Messages
145 (0.09/day)
System Name Room Heater Pro
Processor i9-13900KF
Motherboard ASUS ROG STRIX Z790-F GAMING WIFI
Cooling Corsair iCUE H170i ELITE CAPELLIX 420mm
Memory Corsair Vengeance Std PMIC, XMP 3.0 Black Heat spreader, 64GB (2x32GB), DDR5, 6600MT/s, CL 32, RGB
Video Card(s) Palit GeForce RTX 4090 GameRock OC 24GB
Storage Kingston FURY Renegade Gen.4, 4TB, NVMe, M.2.
Display(s) ASUS ROG Swift OLED PG48UQ, 47.5", 4K, OLED, 138Hz, 0.1 ms, G-SYNC
Case Thermaltake View 51 TG ARGB
Power Supply Asus ROG Thor, 1200W Platinum
Mouse Logitech Pro X Superlight 2
Keyboard Logitech G213 RGB
VR HMD Oculus Quest 2
Software Windows 11 23H2
And that's why it was dumb ~ you're running a business & you don't know "safe" limits for these chips? Even if we do accept some of them running well within safe temps/clocks it was still a slight risk.

Remember Turbo specs/clocks aren't guaranteed & technically it's still OCing ~ Intel/AMD only guarantee base clocks!
That's just wrong. The base frequency is just used just to define the TDP. Going over the TDP is not overclocking, and you don't need to go over the TDP anyway to reach those clocks. Xeon chips can run over the base frequency too. For example the E-2486 has a base frequency of 3.5GHz and can turbo boost up to 5.6GHz.

Speaking of Xeons, the Xeons with the highest clocks are the 6 core E-2486 and 8 core E-2488, both boosting up to 5.6GHz. And their recommended prices are $506.00 and $606.00 respectively. So not sure where some people got the idea that people are using 14900K on servers to save money, since the recommended price for a 14900K is $589.00-$599.00.
 
Joined
Feb 15, 2019
Messages
1,659 (0.78/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
And that's why it was dumb ~ you're running a business & you don't know "safe" limits for these chips? Even if we do accept some of them running well within safe temps/clocks it was still a slight risk.

Remember Turbo specs/clocks aren't guaranteed & technically it's still OCing ~ Intel/AMD only guarantee base clocks!

I think the "True" safe limits are shown in the P-cores only Xeon products.
Yea those with a Max boost of 4.8 GHz (w9-3495X)
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.64/day)
Location
Ex-usa | slava the trolls
Why does Intel sell chips OCed way past their limits?

Because fear of losing the market to the superior Ryzens.

Well I'm not Intel so I can't answer this on their behalf, although I also wouldn't mind them being sued for $100 billion for it. They got away pretty lightly for their 2004-06(?) OEM BS & ideally this time should be different.
 
Joined
Jan 9, 2023
Messages
304 (0.44/day)
Regardless of how much I call Intel names this is on the dumb people putting OCed chips, way past their limits, in servers! The reason why server chips have conservative clocks should be fairly obvious & why running desktop chips at 6Ghz @24*7 is a bad idea.
Surely it has nothing to do with running at 6GHz is crazy inefficient which is quite important for severs.
 
Joined
Apr 12, 2013
Messages
7,536 (1.77/day)
Like I said multiple times I'm not Intel & I'm not defending them selling unstable chips, but the dumb people putting 6Ghz chips into servers should be called out for penny pinching or not doing their homework!
 
Joined
Jan 9, 2023
Messages
304 (0.44/day)
Like I said multiple times I'm not Intel & I'm not defending them selling unstable chips, but the dumb people putting 6Ghz chips into servers should be called out for penny pinching or not doing their homework!
You can't just imply that Intel is willing to sell desktop users chips that go faster than they should just because they want the performance crown and not do so for servers because they are aware this causes the now known issues and not expect people to call you out on that.
That's a massive stretch and we both know it.
 
Joined
Apr 12, 2013
Messages
7,536 (1.77/day)
Pretty sure Intel didn't advertise these 14900k/s for servers. As for RPL (dedicated) server chips I don't have their full specs, any links?
 
Joined
Jul 24, 2024
Messages
242 (1.88/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
Regardless of how much I call Intel names this is on the dumb people putting OCed chips, way past their limits, in servers! The reason why server chips have conservative clocks should be fairly obvious & why running desktop chips at 6Ghz @24*7 is a bad idea.
Sorry but your point of view is not right.

Pretty sure Intel didn't advertise these 14900k/s for servers. As for RPL (dedicated) server chips I don't have their full specs, any links?
That is absolutely irrelevant. Nowhere is written that it cannot be used for serves.

Be it server or non-server processor, the processor itself must remain fully functional till the warranty ends. The CPU has to be configured by a manufacturer in a way that it won't allow to damage itself over time. That's what the thermal throttling and TDP, current limits are for. We have at work multiple desktop PCs equipped with 10th Gen i7 Ks that fill a role of small camera servers, web servers and database server. We chose them as they are perfectly fine for this purpose in terms of their performance and were much cheaper than regular rack server. They have been working 24/7 for a few years now without any problems.

You don't need Xeon or EPYC or Threadripper just because the Xeon/EPYC/TRP primarily used for servers. Any other CPU is capable of handling same tasks or work but may lack some instruction sets so it may not be as effective. Although I agree on a fact that server CPUs tend to be optimized for maximum efficiency. Also, it goes the other way around - meaning you can enjoy playing games on server-grade CPUs without any problems, but you will get lower single-thread performance this way.

As I said, it doesn't matter whether it is OCed or non OCed CPU, it must be configured by the manufacturer to be fully operational at least till it's warranty period ends.
Even running desktop CPU at 6 GHz 24/7 must be perfectly safe and the CPU must not die, provided there is sufficient cooling.
So no, your point of view is not right, as in this case the manufacturer (Intel) let their products run outside of their safe limits for such a long time that it caused unrecovereable damage to them. It's a long-known general information that 1.5V is way too much for modern chips. Personally, I wouldn't go past 1.3V for VCore and 1.2V for VSoC.

Btw Xeons are quite pricy for what they offer. Using Core CPU desktop as a server is nothing special and there's nothing wrong about it as long as your power and performance requirements are met.
 
Last edited:
Joined
Sep 24, 2020
Messages
145 (0.09/day)
System Name Room Heater Pro
Processor i9-13900KF
Motherboard ASUS ROG STRIX Z790-F GAMING WIFI
Cooling Corsair iCUE H170i ELITE CAPELLIX 420mm
Memory Corsair Vengeance Std PMIC, XMP 3.0 Black Heat spreader, 64GB (2x32GB), DDR5, 6600MT/s, CL 32, RGB
Video Card(s) Palit GeForce RTX 4090 GameRock OC 24GB
Storage Kingston FURY Renegade Gen.4, 4TB, NVMe, M.2.
Display(s) ASUS ROG Swift OLED PG48UQ, 47.5", 4K, OLED, 138Hz, 0.1 ms, G-SYNC
Case Thermaltake View 51 TG ARGB
Power Supply Asus ROG Thor, 1200W Platinum
Mouse Logitech Pro X Superlight 2
Keyboard Logitech G213 RGB
VR HMD Oculus Quest 2
Software Windows 11 23H2
It has nothing to do with TDP, Intel/AMD chips are rated i.e. guaranteed for base clocks. Turbo boost is OCing even if "sanctioned" by them.

Intel disagrees:

The processor base frequency is the operating point where TDP is defined.
Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload.

Their definitions are a bit circular, but basically Intel chooses a TDP for that particular CPU. Then they run the CPU at higher and higher frequencies with their benchmark until it reaches that TDP. Then they put those base frequencies in the specs.

In my particular case, with the 13900KF, going over that TDP is not only "sanctioned" by them, it is actually recommended. And about using the baseline power profile, which would enforce the TDP, they say this: "Intel does not recommend Baseline power delivery profiles for the 13th gen and 14th gen K Sku processors unless required for compatibility."

Furthermore, they say this: "Intel recommends using the 'Extreme' power delivery profile if supported by the voltage regulator (VR) and motherboard design". Also: "Intel strongly recommends these values to be applied as BIOS defaults". Which is exactly what ASUS does with their latest BIOS updates. It defaults to the Intel Extreme profile. I can as an option switch to the Intel Performance profile that lowers the current limits. And the Baseline profile, which is not recommended by Intel, is not even an option.

So no, they don't just "sanction" it, they "strongly recommend" the power profiles that allow you to go over the TDP, which will allow you to run all the cores over their base frequency in MT workloads.
 
Joined
Oct 22, 2014
Messages
14,105 (3.82/day)
Location
Sunshine Coast
System Name H7 Flow 2024
Processor AMD 5800X3D
Motherboard Asus X570 Tough Gaming
Cooling Custom liquid
Memory 32 GB DDR4
Video Card(s) Intel ARC A750
Storage Crucial P5 Plus 2TB.
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Eweadn Mechanical
Software W11 Pro 64 bit
Random thought... I wonder if 12th gen is going to start going up in the price if this new fix isn't all its cracked up to be. People will be looking for a way to have a working computer without having to swap out motherboards....
This and used 13th and 14th Gens flooding the market will be worthless.
 
Joined
Apr 12, 2013
Messages
7,536 (1.77/day)
Their definitions are a bit circular, but basically Intel chooses a TDP for that particular CPU. Then they run the CPU at higher and higher frequencies with their benchmark until it reaches that TDP. Then they put those base frequencies in the specs.
The TDP is basically an arbitrary number, although not total BS if you will. But what is Intel guaranteeing with those numbers? A level of performance which can only be measured through fixed/base clocks. Remember AMD & their multiple TDP options on desktop? Yet they guarantee a base clock as well, because that's the level of performance you should expect.
 
Joined
Sep 24, 2020
Messages
145 (0.09/day)
System Name Room Heater Pro
Processor i9-13900KF
Motherboard ASUS ROG STRIX Z790-F GAMING WIFI
Cooling Corsair iCUE H170i ELITE CAPELLIX 420mm
Memory Corsair Vengeance Std PMIC, XMP 3.0 Black Heat spreader, 64GB (2x32GB), DDR5, 6600MT/s, CL 32, RGB
Video Card(s) Palit GeForce RTX 4090 GameRock OC 24GB
Storage Kingston FURY Renegade Gen.4, 4TB, NVMe, M.2.
Display(s) ASUS ROG Swift OLED PG48UQ, 47.5", 4K, OLED, 138Hz, 0.1 ms, G-SYNC
Case Thermaltake View 51 TG ARGB
Power Supply Asus ROG Thor, 1200W Platinum
Mouse Logitech Pro X Superlight 2
Keyboard Logitech G213 RGB
VR HMD Oculus Quest 2
Software Windows 11 23H2
Pretty sure Intel didn't advertise these 14900k/s for servers. As for RPL (dedicated) server chips I don't have their full specs, any links?
Here you go:



BTW, the base frequency, as you can see, has everything to do with TDP. The 6 core CPU has a higher base frequency because that TDP budget is split between fewer cores, so they can clock higher. The boost frequency is the same because it's not affected by the TDP.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The servers farms were 13900K/14900K's sold directly by intel to them and installed on server motherboards from various vendors. Yes they were designed to run 24/7 for years, not degrade in months. Intel directly contributed to them losing tons of money and not all of them can afford that.
These are consumer grade chips, not server grade at all.
The entry level server chips are called the E-2400 series, e.g. Xeon E-2488, which is the Xeon part that closely resembles i9-13900K/14900K, except for it lacking aggressive boost and voltage. If they had gone for proper server grade parts, they likely would never have seen these issues.
I must add that several outlets calls W680 boards "server grade", they are not, they are workstation boards. Don't get me wrong, they are good boards, but wouldn't stop the consumer CPUs from aging prematurely.

If we assume these i9s "age" 4-5x faster than expected due to too much voltage, then this would easily explain why people using these as "servers" would see them fail after ~3-6 months. This only tells us that they've gotten away with consumer grade hardware in the past, because CPUs from the past few generations have been very reliable.

So if intel apparently 'fixed' this oxidization issue which apparently plagued early 13th gen batches, obviously they knew about it. And then they did....nothing? For years?
We don't know what they did, and until we have evidence we shouldn't speculate.
What we should do instead is to encourage those with contacts within Intel to publicly address this more precisely;
- Which product ranges were affected?
- For how long did this problem happen?
- Was this limited to certain production lines or everything?

I'm aware that it's not common, it probably represents 1% of total servers. But among game hosting servers it's not as rare to put desktop CPU's/sockets because they want good single threaded performance, not a lot of threads and don't have the need to run the systems for 20 years or even 10. These systems also have a much smaller blast radius. These guys just want the CPU to be able to do it's job till the next upgrade cycle at which point they just upgrade the CPU's, all for a fraction of the cost of xeon/epyc.
As mentioned, the Xeon E-2400 series offers similar performance, probably ~95% of the same performance for more sustained loads when loaded up with many threads.
As for the argument about "blast radius" which several mentioned; servers crashing regularly shouldn't be an issue like that, and servers crashing regularly is normally fairly unheard of, and the amount of management overhead could easily justify having fewer servers with more cores. What this sounds to me is some companies are cutting corners and being unprofessional. (not that this in any way reduces the issues for Intel, this just exposes what practices some companies have)
 
Joined
Oct 15, 2019
Messages
585 (0.31/day)
Of course these are not designed to burn out in a few months - but if you have blades burning out your 14900Ks ... why... put... more... 14900Ks... in those blades? Not saying that the chip is good, but when you're putting a yolked 14900K into a blade to save money this is kind of exactly the downside.
Well, all the previous intel chips have been solid in blade form factor servers. Suddenly it's supposed to be "CoMmoN KnoWLedGe" that K chips explode in a couple of months of real use. Come on.
Server chips are usually xeons or pentiums and they run at 2.1-3.2 ghz max - and they sit there for 20 years doing it.
Maybe in some workloads. For game server hosting such setups are pretty shit. Intel does make some high ST performance Xeons, but they are not really needed and are actually worse for the workload - as well as for price, as the upside of ECC memory is not something necessary for the task. One game failing out of 100 000 because of the lack of ECC is not a thing anyone cares about.

This only tells us that they've gotten away with consumer grade hardware in the past, because CPUs from the past few generations have been very reliable.
The 'past few generations' in this case means 'all previous generations'. And now it's supposed to be "CoMMoN KnoWlEDge" that their desktop parts are shit. Come on.
 
Joined
Sep 24, 2020
Messages
145 (0.09/day)
System Name Room Heater Pro
Processor i9-13900KF
Motherboard ASUS ROG STRIX Z790-F GAMING WIFI
Cooling Corsair iCUE H170i ELITE CAPELLIX 420mm
Memory Corsair Vengeance Std PMIC, XMP 3.0 Black Heat spreader, 64GB (2x32GB), DDR5, 6600MT/s, CL 32, RGB
Video Card(s) Palit GeForce RTX 4090 GameRock OC 24GB
Storage Kingston FURY Renegade Gen.4, 4TB, NVMe, M.2.
Display(s) ASUS ROG Swift OLED PG48UQ, 47.5", 4K, OLED, 138Hz, 0.1 ms, G-SYNC
Case Thermaltake View 51 TG ARGB
Power Supply Asus ROG Thor, 1200W Platinum
Mouse Logitech Pro X Superlight 2
Keyboard Logitech G213 RGB
VR HMD Oculus Quest 2
Software Windows 11 23H2
These are consumer grade chips, not server grade at all.
The entry level server chips are called the E-2400 series, e.g. Xeon E-2488, which is the Xeon part that closely resembles i9-13900K/14900K, except for it lacking aggressive boost and voltage. If they had gone for proper server grade parts, they likely would never have seen these issues.
I must add that several outlets calls W680 boards "server grade", they are not, they are workstation boards. Don't get me wrong, they are good boards, but wouldn't stop the consumer CPUs from aging prematurely.

If we assume these i9s "age" 4-5x faster than expected due to too much voltage, then this would easily explain why people using these as "servers" would see them fail after ~3-6 months. This only tells us that they've gotten away with consumer grade hardware in the past, because CPUs from the past few generations have been very reliable.


We don't know what they did, and until we have evidence we shouldn't speculate.
What we should do instead is to encourage those with contacts within Intel to publicly address this more precisely;
- Which product ranges were affected?
- For how long did this problem happen?
- Was this limited to certain production lines or everything?


As mentioned, the Xeon E-2400 series offers similar performance, probably ~95% of the same performance for more sustained loads when loaded up with many threads.
As for the argument about "blast radius" which several mentioned; servers crashing regularly shouldn't be an issue like that, and servers crashing regularly is normally fairly unheard of, and the amount of management overhead could easily justify having fewer servers with more cores. What this sounds to me is some companies are cutting corners and being unprofessional. (not that this in any way reduces the issues for Intel, this just exposes what practices some companies have)

If the i9s age 4-5x faster, and that's why they fail after 3-6 months., then the Xeons would be expected to fail at similar rates after 12 - 30 months. Not sure that would be acceptable for a Xeon CPU, and I doubt that is the case.

As for the similar performance, you mentioned many threads. But the companies doing this do not need many threads, so that's irrelevant. They wanted the highest clocks possible because their use case is limited by single thread performance.

Furthermore, when the Raptor Lake i9s have been launched, those 2400 series Xeon CPUs were not even an option. The Raptor Lake i9s were released in Q4'22. The Xeons one year later, in Q4'23. The only option before that were the Rocket Lake Xeons, that clocked only up to 5.30 GHz.

Yes, people using the i9s on servers thought they took a just slight risk to get more performance. But "a slight risk", in my opinion, would be to lose 1%, 2%, maybe up to 5% of the CPUs after a year. This is much worse.

And, even ignoring the server discussion, this is unacceptable for regular consumers. And you can't say that the consumers should have bought Xeons instead.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The 'past few generations' in this case means 'all previous generations'. And now it's supposed to be "CoMMoN KnoWlEDge" that their desktop parts are shit. Come on.
You're actually wrong about this.
In fact, almost every other generation(major microarchitecture) have had some kind of major issue (and the same goes for the other team, but we're apparently not permitted to mention that), but most have a fairly short memory. What's unusually here is that the issue(s) are discovered so long after the product release. That usually only happens with security vulnerabilities or bugs which are very hard to pin down. And for historical context, just a few examples: Sandy Bridge (desktop) was a nightmare in the beginning with CPU and chipset issues, people predicted a disaster for Intel back then. On top of that Sandy Bridge-E and server has bugs too. But all of which were resolved, and are now remembered as one of the great milestones in CPU history. :)

If the i9s age 4-5x faster, and that's why they fail after 3-6 months., then the Xeons would be expected to fail at similar rates after 12 - 30 months. Not sure that would be acceptable for a Xeon CPU, and I doubt that is the case.
Well, it's not linear, it depends on how aggressively the CPU applies too much voltage (which is probably why we hear about more i9-13900Ks than i5-13600s), and the Xeons don't have this aggressive voltage for extreme boosting. So there is presumably no issue there at all.

As for the similar performance, you mentioned many threads. But the companies doing this do not need many threads, so that's irrelevant. They wanted the highest clocks possible because their use case is limited by single thread performance.
It's not like they're hosting a single game on one server at the time ;)
 
Joined
Oct 15, 2019
Messages
585 (0.31/day)
You're actually wrong about this.
In fact, almost every other generation(major microarchitecture) have had some kind of major issue (and the same goes for the other team, but we're apparently not permitted to mention that), but most have a fairly short memory. What's unusually here is that the issue(s) are discovered so long after the product release. That usually only happens with security vulnerabilities or bugs which are very hard to pin down. And for historical context, just a few examples: Sandy Bridge (desktop) was a nightmare in the beginning with CPU and chipset issues, people predicted a disaster for Intel back then. On top of that Sandy Bridge-E and server has bugs too. But all of which were resolved, and are now remembered as one of the great milestones in CPU history. :)
Ok, name one processor family with problems like this, but such that did indeed not exist in the comparable xeon parts.
 
Joined
Nov 13, 2007
Messages
10,771 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6000 CL30-36-36-76
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
Ok, name one processor family with problems like this, but such that did indeed not exist in the comparable xeon parts.
Comet lake CPUs had unstable platforms, Rocket Lake was also unstable and needed a bios fix that Xeons did not have. AMD AM4 had WHEA issues that EPYC did not have, AM5 had IO and memory training/boot issues that Genoa epycs did not have.

A possible defect in 11th gen CPUs. (finally fixed with BIOS update 1601 11/24/22 M13H MB) | Overclock.net

You're right -- this is the worst one by a mile -- im not at all saying it's good. But consumer cpus are not generally considered 'STABLE' for 'PRODUCTION' workloads.

The main issue is that it's taking intel FOREVER to respond, and it feels like they're purposely dragging their feet in admitting what's actually wrong with the hardware -- which is not common for these types of issues. It feels like even Spectre/Meltdown was addressed faster than this... and that was a 10-25% hit inperformance to alot of existing processors at the time.

This is the 'Chernobyl' response strategy and it never ends up well.
 
Last edited:
Joined
Oct 15, 2019
Messages
585 (0.31/day)
Comet lake CPUs had unstable platforms, Rocket Lake was also unstable and needed a bios fix that Xeons did not have. AMD AM4 had WHEA issues that EPYC did not have.

A possible defect in 11th gen CPUs. (finally fixed with BIOS update 1601 11/24/22 M13H MB) | Overclock.net

You're right -- this is the worst one by a mile -- im not at all saying it's good. But consumer cpus are not generally considered 'STABLE' for 'PRODUCTION' workloads.
Are you confident in stating that those issues did not affect any xeon parts? Or are you just assuming because you have not heard of similar things?
Of course intel used to run desktop chips to Xeons only after a good while of selling them for general usage, so they could weed out some problems early on, which does explain some of the stuff.

Anyway, none of the listed instability problems were of this scale, just a few% of chips being bad from the first couple of batches, or in the case of the temperature reading bug - a total non issue for running PRODUCTION servers.

And just to reiterate, there are differences in PRODUCTION workloads. Some need the utmost data integrity and stability, while others DO NOT. A bank is very different to a game server hosting service. For the game hosting service only the facilitating and load management servers need to be very reliable, others you just spin more up if some fail. None the less, they are PRODUCTION workloads.
 
Joined
Nov 13, 2007
Messages
10,771 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6000 CL30-36-36-76
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
Are you confident in stating that those issues did not affect any xeon parts? Or are you just assuming because you have not heard of similar things?
Of course intel used to run desktop chips to Xeons only after a good while of selling them for general usage, so they could weed out some problems early on, which does explain some of the stuff.

Anyway, none of the listed instability problems were of this scale, just a few% of chips being bad from the first couple of batches, or in the case of the temperature reading bug - a total non issue for running PRODUCTION servers.

And just to reiterate, there are differences in PRODUCTION workloads. Some need the utmost data integrity and stability, while others DO NOT. A bank is very different to a game server hosting service. For the game hosting service only the facilitating and load management servers need to be very reliable, others you just spin more up if some fail. None the less, they are PRODUCTION workloads.
Im very confident in telling you that the amount of issues that Epycs and Xeons have, and their respective platforms, is tiny compared to the amount of issues consumer platforms have.

Production workloads, by definition, require stability and performance, but first and foremost stability. You can build a mac mini farm, or a rasberry pi farm, or a farm of blades running game servers run a specific workload and call it 'Production', but if your processor comes with marketing materials with the words 'Exxxtreme' or 'Overclocking' or 'Gamers' it is not a production class system.

Keep in mind im not disagreeing with you -- this is a total disaster for them on the prosumer side, and highlights a huge gap in their product line. But it's much more common to have problems when you repurpose consumer gear.
 
Joined
Oct 15, 2019
Messages
585 (0.31/day)
Production workloads, by definition, require stability and performance, but first and foremost stability.
No. Production just means that the environments are used by end users. TYPICALLY it has to be very stable and so forth, but it is not a requirement at all for some environment to be 'production'. Production workloads then are any workloads that are run in production environments.


Marketing jargon is its own thing, and has nothing to do with the topic. Saying something is "production class" is also just marketing.

Im very confident in telling you that the amount of issues that Epycs and Xeons have, and their respective platforms, is tiny compared to the amount of issues consumer platforms have.
Yes, smaller, but failure rate has likely been something around 1% for consumer stuff and 0.1% for xeon stuff - pretty small for both. For some production use it makes total sense to use non xeon parts, as they are faster and cheaper. Now suddenly the failure rates are 50% and the message from some users here has been like "They should have know", which is total bullcrap.
 
Last edited:
Top