These are consumer grade chips, not server grade at all.
The entry level server chips are called the E-2400 series, e.g.
Xeon E-2488, which is the Xeon part that closely resembles i9-13900K/14900K, except for it lacking aggressive boost and voltage. If they had gone for proper server grade parts, they likely would never have seen these issues.
I must add that several outlets calls W680 boards "server grade", they are not, they are workstation boards. Don't get me wrong, they are good boards, but wouldn't stop the consumer CPUs from aging prematurely.
If we assume these i9s "age" 4-5x faster than expected due to too much voltage, then this would easily explain why people using these as "servers" would see them fail after ~3-6 months. This only tells us that they've gotten away with consumer grade hardware in the past, because CPUs from the past few generations have been very reliable.
We don't know what they did, and until we have evidence we shouldn't speculate.
What we should do instead is to encourage those with contacts within Intel to publicly address this more precisely;
- Which product ranges were affected?
- For how long did this problem happen?
- Was this limited to certain production lines or everything?
As mentioned, the Xeon E-2400 series offers similar performance, probably ~95% of the same performance for more sustained loads when loaded up with many threads.
As for the argument about "blast radius" which several mentioned; servers crashing regularly shouldn't be an issue like that, and servers crashing regularly is normally fairly unheard of, and the amount of management overhead could easily justify having fewer servers with more cores. What this sounds to me is some companies are cutting corners and being unprofessional. (not that this in any way reduces the issues for Intel, this just exposes what practices some companies have)