I have a 13900KF since December 2022, and I've been experiencing issues with it since February 2023. My computer stays on 24/7. I didn't overclock my CPU, I just used the Asus defaults on the ROG STRIX Z790-F GAMING WIFI. Those defaults were probably higher than what Intel recommends, especially regarding power limits, but I wouldn't call that overclocking. I had to underclock my RAM from its advertised 6400 speed to 5600, because the 4 sticks were not stable at 6400. They did however became 100% stable at 5600.
First, since February 2023, I have been experiencing random crashes when fine tuning AI models on the GPU, once or twice per day. Since I was intending to run those fine tuning sessions for days, even weeks, that was quite annoying. My first suspicion was the software I was using for AI model fine-tuning was buggy, since everything else on my computer seemed 100% stable. I had no reason to suspect the CPU, since the CPU was not even stressed by the AI model fine-tunning, it was mostly stressing the GPU.
I even used a Linux version of the software instead of Windows at some point, but the issues persisted. The crashes were quite strange. I was fine tuning a Stable Diffusion generative AI model for images, and about every 10 minutes it was generating a few sample image files based on the current state of the model. It was crashing mostly when saving those images to disc. It was reading them fine from GPU memory, then, while writing them to disk, it was crashing. And it was crashing in strange ways, strange python errors, that should have been impossible according to the python code. Python variables seemed to have been corrupted and contained random values.
Seeing this, I suspected first that I had RAM issues again. I underclocked my RAM even further, but that didn't change anything. I even removed 2 sticks of RAM, leaving just 2, the errors persisted. Changed the SSD, the errors persisted. Deactivated E-cores, hyperthreading, virtualization, the errors persisted.
What's more, by April 2023 the problems seemed to be getting worse. Now I had new browser tabs crashing immediately with access violation errors. And even scrolling through a directory with a large number of files with a file manager (Total Commander) caused it to crash. All this while the computer was basically idle, for the most part. All this indicated it was more than an issue with that Gen-AI fine-tuning software, and my computer had a serious issue. However, being very busy with work at the time I didn't have time to investigate this further for a couple of months.
In June 2023 I decided to get to the bottom of it, and suspecting the CPU was the cause of the issues, I started playing with the settings again. Disabled any ASUS defaults that sounded like overclocking. I disabled speedstep, velocity boost, speedshift, turbo mode, etc. and enabled the power saving mode in windows. That fixed the problem, the system became fully stable. But the CPU was running at only 2.7GHz peak in this configuration, which was clearly unacceptable, and it felt very sluggish.
So, a few days later, I re-enabled the options I previously disabled. This made it unstable again. Then I reduced the P-core peak frequency from the default of 5800 first to 5700, which improved stability then to 5600, which seemed to make it 100% stable. 6 months after buying the CPU, I finally had a stable computer that had reasonable performance.
In August 2023 it started showing signs of instability again, browser tabs crashing and game crashes, so I had to reduce the CPU P-core frequency further, to 5500 max on two cores, and 5400 the rest.
In October 2023 I just bought two just released games, and both all had bugs and crashes. I assumed first that the games were buggy. I even refunded them as one of them was even crashing when starting it. After buying a 3rd game, and experiencing similar issues, I realized it must be my CPU causing issues again. I reduced the frequency even further to 5400 max on two cores and 5300 for the rest. That made it stable again.
November 2023, again stability issues and crashes. Reducing P-core speeds to 5300 on 2 cores 5200 the rest made it stable again.
May 2024, stability issues again. Playing Cyberpunk 2077, first crashes every couple of hours, then every hour, then every 30 minute, then finally every 10 minutes. Reduced the speed to 5200 on 2 P-cores and 5100 on the rest, it became stable again.
July 2024, signs of stability issues again, various software crashing if I left it open over night (the browser, the mail client, Visual Studio Code). I was planning to reduce the speed further to 5100 max on 2 P-cores and 5000 on the rest, then last week I saw the GamerNexus video about the issues with 14th and 13th gen, and it seemed very similar to what I was experiencing. Until then I just assumed I was very unlucky with the silicon lottery, but now it seems that Intel really messed up.
I didn't update the MB BIOS since spring 2024, so this month I finally updated to the latest version. This introduced the Intel settings, and I loaded the defaults, hoping it might fix my issues. No XMP activated. The defaults were intel extreme, and the system was very unstable with those defaults. Windows was not even booting most of the time. I switched to the performance intel defaults. Same issues. Basically, any P-Core being run at 5500MHz or more guarantees issues on my CPU. Then I tried underclocking the CPU manually. But you can't do that with the intel defaults, so I had to switch back to the ASUS defaults, then I started adjusting the settings again to find the optimal CPU and RAM settings to keep my computer stable with the new BIOS.
So far, after days of testing, there doesn't seem to be any improvement with the new BIOS. I'll probably get it completely stable by the end of the week by underclocking it to around 5000 MHz max for the P-Cores. Since December 2022, I probably spent weeks dealing with these issues, so, in an ideal world, not only should Intel give affected people a refund, but also compensate us for all the lost time and aggravation. In the real world, we'll probably be lucky to get an acknowledgment from Intel of their responsibility, and a refund.
You may wonder why I didn't try to get the CPU replaced all this time. Well, I use my computer for work too, so it would have problematic to have to wait for a replacement CPU without a working computer. The alternative was to buy a cheap CPU to use until I get the 13900KF replaced. In the end, I'm glad I didn't try to get a 13900KF replacement, or even a 14900K, in light of the recent news.
In conclusion, I can't wait for the new AMD CPUs to become available. I switched to Intel, having used a 5800X previously, because, among other things, the 5800X was freezing every one or two weeks, and I hoped Intel would be more stable. Unfortunately, the Intel CPU has been much worse.