Sunday, December 8th 2024

Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

Dec 8th, 2024 20:50 Discuss (52 Comments)

The gaming performance of Intel's latest Core Ultra 200-series "Arrow Lake-S" desktop processors missed the mark by quite a bit, ending up slower than the 14th Gen Core "Raptor Lake" processors. Adding pressure to Intel is AMD's recent launch of the Ryzen 7 9800X3D, which extends the company's leadership in gaming performance, ending up to 12% faster than the top Core Ultra 9 285K at gaming (1080p). The company then announced that it has identified possible reasons why gaming performance of "Arrow Lake" ended up below expectations, and that it's working on a microcode-level update to the processor.

A discussion in the ASUS ROG Forums sheds light on what this microcode update could be. Allegedly, it's called the Intel 0x114 Microcode Update, and you can expect it soon in a beta UEFI firmware update from ASUS and other motherboard vendors, which makes it possible that we see a public release of the microcode either by yearend, or in Q1-2025. There's still no word on the extent of gaming performance gain from this microcode, but if we were to speculate, Intel wouldn't bother with such an update if it didn't at least bring "Arrow Lake" to the same gaming performance level as "Raptor Lake," if not higher.

Source: HotHardware

Add your own comment

52 Comments on Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

#26

Wolverine2349

bugArrow Lake's latency may not be ideal. But it's better than Zen 5's: chipsandcheese.com/i/152587465/core-to-core-latency

How much better between IMC and CPU die. And how is it compared to Zen 4 latency?

Isn't Zen 5 a step back in latency compared to Zen 5.

And even so the core layout and topology sucks big time on Arrow Lake.

Zen 3 through Zen 5 have good core to core latency within a single 8 core CCX within a single CCD. Once it leaves a CCD/CCX yikes its bad.

Alder and Raptor Lake had so much better latency leaving the CPU cores on the monolothic die. Arrow Lake broke that and while its packed approach has a bit better latency than AMD, it still borked what Intel had going for it big time.

#27

igormp

bugArrow Lake's latency may not be ideal. But it's better than Zen 5's: chipsandcheese.com/i/152587465/core-to-core-latency

Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Wolverine2349Isn't Zen 5 a step back in latency compared to Zen 5.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

#28

jaresk

AMD also promised an uplift from microcode updates for underperforming zen5 processors, It raised performance by 1-2%. I wouldn't expect much more from arrow lake.

#29

Wolverine2349

igormpCopy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

Well good intra CCX. But good intra CCD because all Zen archs from Zen 3 through Zen 5 (Who the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.

#30

Kapone33

Intel does not seem to understand how the World has changed.

#31

igormp

Wolverine2349Well good inter CCX. But good inter CCD because all Zen archs from Zen 3 through Zen 5 (WHo the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.

Yeah, good catch. Zen 6c is supposed to have 2x CCXes per CCD once again, so we shall see how it behaves if something like that lands on the consumer side.
Also, I noticed that I mixed up the intra/inter terms, sorry for that.

#32

Visible Noise

bugArrow Lake's latency may not be ideal. But it's better than Zen 5's: chipsandcheese.com/i/152587465/core-to-core-latency

Uh oh. You went and brought facts into the hate train.

#33

rv8000

Visible NoiseUh oh. You went and brought facts into the hate train.

Too bad the source material (more like lackthereof) for the article are two ROG forum posts talking about the micro code naming and nothing else. There’s zero substance, how is this a news article anyways.

#34

Why_Me

Does anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.

#35

Hecate91

Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.

#36

Wolverine2349

Why_MeDoes anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.

Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.

Hecate91Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.

Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??

#37

DemonicRyzen666

igormpCopy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

Thats because e-core L3 cache is shared amoung the 4 cluster of e-cores as victim cache. Data tends to stay in their a lot.

Intel could also drop unused instruction sets on p-cores, but since its slices of L3 per-p-core insteads of one whole slice shared amoung p-cores there is latncy moving out of the slices of L3 cache. Changing the way they're the P-cores are sittng/aranged on the tile, & centralizing a large single L3 cache sliice would be away to fix it.

At the same time they could just use less L3/L2 cache as arrowlake seems to be bandwisth straved just looking at the CU-DIMM results of 9200mhz it gains a lot more than zen 5 does with high ram speed.

#38

x4it3n

Arrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !

#39

Why_Me

Wolverine2349Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.

Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??

This is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.

Without.

#40

persondb

bugArrow Lake's latency may not be ideal. But it's better than Zen 5's: chipsandcheese.com/i/152587465/core-to-core-latency

You are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.

#41

x4it3n

Why_MeThis is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.

Without.

CUDIMM will be enabled on some AMD Motherboards with X870(E) chipsets so I'm curious to see how ZEN 5 and ZEN 5 3D will take advantage of it too!

#42

rv8000

x4it3nCUDIMM will be enabled on some AMD Motherboards with X870(E) chipsets so I'm curious to see how ZEN 5 and ZEN 5 3D will take advantage of it too!

Source?

#43

Dawora

Ultra series is very overpriced gpu .
better to buy 13/14 series or Amd Cpu

x4it3nArrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !

it depends what GPU ppls use..
14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

Also, not many high end gamers use 1080p so dif is not so big in reality

If we testing 480p we will se bigger differences, or using upcoming 5090 and 240p we see even more bigger differences.
i just want to tell, if test is not close the reality its worthless

#44

x4it3n

rv8000Source?

videocardz.com/newz/msi-confirms-cu-dimm-ddr5-memory-support-for-ryzen-9000-and-8000-series

It is not full support yet, but will probably be with BIOS updates in several months.

Dawora14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

14900K much faster? I think you meant 9800X3D right? Because every unbiased IT Channel or Website will tell you that the 9800X3D is currently the best Gaming GPU on the planet!
And I'm pretty sure that once the 5090 is out, the 9800X3D will widen the gap even more!

I personally have a 4K 240Hz monitor with a 4090 so I definitely try to play at 4K as much as possible. If the framerate is not acceptable then I play with DLSS Quality (1440p rendered) and sometimes with Frame Generation like in Black Myth: Wukong or Cyberpunk 2077 w/ Path Tracing...

But even at 4K and depending on games the 1% & 0.1% Lows can be very different depending on which CPU you are using !

#45

unwind-protect

bugArrow Lake's latency may not be ideal. But it's better than Zen 5's: chipsandcheese.com/i/152587465/core-to-core-latency

"Lion Cove P-Cores however don’t do so well. Worst case latency between P-Cores can approach cross-CCD latency on AMD’s chiplet designs."

That is quite bad. In a 16-core AMD you can just lock a low-concurrency app such a game into one CCD.

But Arrow Lake has these delays between all P-cores?

#46

Wolverine2349

persondbYou are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.

Yes I wonder the same thing,

Of course Arrow Lake latency beyond 8 cores with the whole 24 core die will be better than any Zen 3 to 5 because all cores are on a single tile/die where as AMD beyond 8 cores has to cross CCX/CCD and go through IF and ouch.

But how is Arrow Lake latency to IMC tile and other interconnects like its ring bus compared to AMD Zen 3 to 5 latency from a CCD/CCX t the infinity fabric and such?

Of course Raptor Lake and Alder Lake 10nm monolithic dies to the IMC and ring kick both's butt, but we have no monolithic die that is not 2 years old and oh has degradation issues and unknown how much the microcode update really fixes it long term???

#47

rv8000

x4it3nvideocardz.com/newz/msi-confirms-cu-dimm-ddr5-memory-support-for-ryzen-9000-and-8000-series

It is not full support yet, but will probably be with BIOS updates in several months.

Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.

#48

Han44

In the past, people were burned at the stake for Magic (as it is written in the title), and today I use spells (microcode) on CPU, magic is a power in advanced technology :)

#49

x4it3n

rv8000Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.

CUDIMM is still new so we're years away anyway! But they might make the X870 Motherboards compatible and work as intended when paired with with ZEN 6 CPUs! (which I will probably get since it's still on AM5 and they might have 12c/24t per CCD + Dual V-Cache for dual CCDs).

#50

notoperable

12 last months of Intel news gave me a lot of bad aftertaste. I went back to Intel with Raptor Lake and that already was (is) a bumpy ride, since it came with its own can of worms
and it seems like Intel is doing its best in digging its own grave. WTF? Who in his right mind would buy the new CPUs which (cause Intel needs keep its f****g tradition) sits in a
new SOCKET, because, why not?

Ergo: INTEL GTFO

Add your own comment

Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

52 Comments on Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

Related News

52 Comments on Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts