Sunday, December 8th 2024
Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"
The gaming performance of Intel's latest Core Ultra 200-series "Arrow Lake-S" desktop processors missed the mark by quite a bit, ending up slower than the 14th Gen Core "Raptor Lake" processors. Adding pressure to Intel is AMD's recent launch of the Ryzen 7 9800X3D, which extends the company's leadership in gaming performance, ending up to 12% faster than the top Core Ultra 9 285K at gaming (1080p). The company then announced that it has identified possible reasons why gaming performance of "Arrow Lake" ended up below expectations, and that it's working on a microcode-level update to the processor.
A discussion in the ASUS ROG Forums sheds light on what this microcode update could be. Allegedly, it's called the Intel 0x114 Microcode Update, and you can expect it soon in a beta UEFI firmware update from ASUS and other motherboard vendors, which makes it possible that we see a public release of the microcode either by yearend, or in Q1-2025. There's still no word on the extent of gaming performance gain from this microcode, but if we were to speculate, Intel wouldn't bother with such an update if it didn't at least bring "Arrow Lake" to the same gaming performance level as "Raptor Lake," if not higher.
Source:
HotHardware
A discussion in the ASUS ROG Forums sheds light on what this microcode update could be. Allegedly, it's called the Intel 0x114 Microcode Update, and you can expect it soon in a beta UEFI firmware update from ASUS and other motherboard vendors, which makes it possible that we see a public release of the microcode either by yearend, or in Q1-2025. There's still no word on the extent of gaming performance gain from this microcode, but if we were to speculate, Intel wouldn't bother with such an update if it didn't at least bring "Arrow Lake" to the same gaming performance level as "Raptor Lake," if not higher.
49 Comments on Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"
Isn't Zen 5 a step back in latency compared to Zen 5.
And even so the core layout and topology sucks big time on Arrow Lake.
Zen 3 through Zen 5 have good core to core latency within a single 8 core CCX within a single CCD. Once it leaves a CCD/CCX yikes its bad.
Alder and Raptor Lake had so much better latency leaving the CPU cores on the monolothic die. Arrow Lake broke that and while its packed approach has a bit better latency than AMD, it still borked what Intel had going for it big time.
Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.
Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.
Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.
One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores. Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)
Also, I noticed that I mixed up the intra/inter terms, sorry for that.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.
If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.
Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday. Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??
Intel could also drop unused instruction sets on p-cores, but since its slices of L3 per-p-core insteads of one whole slice shared amoung p-cores there is latncy moving out of the slices of L3 cache. Changing the way they're the P-cores are sittng/aranged on the tile, & centralizing a large single L3 cache sliice would be away to fix it.
At the same time they could just use less L3/L2 cache as arrowlake seems to be bandwisth straved just looking at the CU-DIMM results of 9200mhz it gains a lot more than zen 5 does with high ram speed.
With CUDIMM.
Without.
What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).
For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.
This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.
better to buy 13/14 series or Amd Cpu it depends what GPU ppls use..
14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU
Also, not many high end gamers use 1080p so dif is not so big in reality
If we testing 480p we will se bigger differences, or using upcoming 5090 and 240p we see even more bigger differences.
i just want to tell, if test is not close the reality its worthless
It is not full support yet, but will probably be with BIOS updates in several months. 14900K much faster? I think you meant 9800X3D right? Because every unbiased IT Channel or Website will tell you that the 9800X3D is currently the best Gaming GPU on the planet!
And I'm pretty sure that once the 5090 is out, the 9800X3D will widen the gap even more!
I personally have a 4K 240Hz monitor with a 4090 so I definitely try to play at 4K as much as possible. If the framerate is not acceptable then I play with DLSS Quality (1440p rendered) and sometimes with Frame Generation like in Black Myth: Wukong or Cyberpunk 2077 w/ Path Tracing...
But even at 4K and depending on games the 1% & 0.1% Lows can be very different depending on which CPU you are using !
That is quite bad. In a 16-core AMD you can just lock a low-concurrency app such a game into one CCD.
But Arrow Lake has these delays between all P-cores?
Of course Arrow Lake latency beyond 8 cores with the whole 24 core die will be better than any Zen 3 to 5 because all cores are on a single tile/die where as AMD beyond 8 cores has to cross CCX/CCD and go through IF and ouch.
But how is Arrow Lake latency to IMC tile and other interconnects like its ring bus compared to AMD Zen 3 to 5 latency from a CCD/CCX t the infinity fabric and such?
Of course Raptor Lake and Alder Lake 10nm monolithic dies to the IMC and ring kick both's butt, but we have no monolithic die that is not 2 years old and oh has degradation issues and unknown how much the microcode update really fixes it long term???