Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

bug · Monday at 5:03 PM

Wolverine2349 said:
And that design sucks. Why bork what wasn't broke and worked so much better on Alder Lake and Raptor Lake where the P cores were together and e-cores on the side with still good latency even going to e-core though not as good and excellent. That way P cores were prioritized and e-cores the great secondary auxiliary threads kicked in when needed and it worked very well. And for stuff that scaled to infinite threads it did not matter.

Yes I know e-cores on Arrow Lake being Skymont are a lot stronger, but they still are not as strong as Intel claimed (LMAO Raptor Cove IPC lol, well maybe in cherry picked tests of IPC but not all rounder), so e-cores still not as good and cramming clusters in middle of p cores was a bad design choice.

And I do not think this update will help gaming that much. The tile based latency is real and its not gonna magically make it better or even on par with Raptor Lake which is a monolithic die that has a faster ring clock and P cores better in FP IPC and not far off in Integer IPC with as fast or faster clocks.

Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

Wolverine2349 · Monday at 5:26 PM

bug said:
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

How much better between IMC and CPU die. And how is it compared to Zen 4 latency?

Isn't Zen 5 a step back in latency compared to Zen 5.

And even so the core layout and topology sucks big time on Arrow Lake.

Zen 3 through Zen 5 have good core to core latency within a single 8 core CCX within a single CCD. Once it leaves a CCD/CCX yikes its bad.

Alder and Raptor Lake had so much better latency leaving the CPU cores on the monolothic die. Arrow Lake broke that and while its packed approach has a bit better latency than AMD, it still borked what Intel had going for it big time.

igormp · Monday at 5:40 PM

bug said:
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Wolverine2349 said:
Isn't Zen 5 a step back in latency compared to Zen 5.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

jaresk · Monday at 6:01 PM

AMD also promised an uplift from microcode updates for underperforming zen5 processors, It raised performance by 1-2%. I wouldn't expect much more from arrow lake.

Wolverine2349 · Monday at 6:27 PM

igormp said:
Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

Well good intra CCX. But good intra CCD because all Zen archs from Zen 3 through Zen 5 (Who the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.

kapone32 · Monday at 6:35 PM

Intel does not seem to understand how the World has changed.

igormp · Monday at 7:02 PM

Wolverine2349 said:
Well good inter CCX. But good inter CCD because all Zen archs from Zen 3 through Zen 5 (WHo the heck knows what Zen 6 is gonna be) have 1 8 core CCX per CCD so good core to core latency within a CCD as such.

Yeah, good catch. Zen 6c is supposed to have 2x CCXes per CCD once again, so we shall see how it behaves if something like that lands on the consumer side.
Also, I noticed that I mixed up the intra/inter terms, sorry for that.

Visible Noise · Monday at 9:09 PM

bug said:
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

Uh oh. You went and brought facts into the hate train.

rv8000 · Monday at 9:27 PM

Visible Noise said:
Uh oh. You went and brought facts into the hate train.

Too bad the source material (more like lackthereof) for the article are two ROG forum posts talking about the micro code naming and nothing else. There’s zero substance, how is this a news article anyways.

Why_Me · Monday at 9:39 PM

Does anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.

Hecate91 · Monday at 9:45 PM

Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.

Wolverine2349 · Monday at 9:58 PM

Why_Me said:
Does anyone here actually think people are purchasing this chipset solely for gaming? This is a decent route to go if you want to mix productivity with gaming while not breaking the bank not to mention the power usage on these cpu's are far better than the previous 13/14 gen cpu's along with the fact this socket runs CUDIMM straight out of the box.

Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.

Hecate91 said:
Intel advertised arrow lake as being "on par" with 13th/14th gen for gaming but that turned out to not be the case, as for productivity and gaming Zen 4 & Zen 5 is still better while using even less power.
A microcode update isn't going to be some magical fix when the problem is the architecture itself, and it would have to be a significant improvement to catch up to Zen 5 X3D.

Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??

DemonicRyzen666 · Monday at 10:17 PM

igormp said:
Copy pasting what I wrote in another thread since it seems fitting:

Ryzen has been widely plaged by this cross-CCD issue, specially in games, in both chiplet designs with multiple CCDs, to the monolithic designs with different CCXes (which are still a thing in AMD's latest hybrid-core devices). This is a fact, otherwise stuff like core parking, different chipset drivers to workaround scheduler issues and whatnot wouldn't be a thing as the "solution" to keep tasks pinned to a single set of CPUs in order to avoid such latency hit, making it so that those tasks consistently see a communication latency of 25~50ns instead of 75~80.

Now for Intel's case, as you can see in the link you posted, ALL P-cores have a high latency when talking to one another, with the best case scenario being the 2 last P-cores (7&8) talking to one another (57ns penalty).
So for applications that require high performance and get pinned to the 8 P-cores will all be seeing a latency of 57~87ns, whereas on an AMD system you'd be pinning such application to the same 8 cores from a CCD, keeping their latency down to 25~50ns.

Of course, for applications that are not sensitive to cross-core communications this is irrelevant, and those tasks are also often then ones that can scale to multiple cores without issues (so the multi-CCD latency or that ring bus limitation stop being an issue), but other applications (like games, as said before) are really sensitive to that and it does lead to worse performance.
Couple that with Windows' shitty scheduler, and it doesn't do any good.

One simple example of that was some applications having better performance being pinned solely to the E-cores - which have lower latency when communicating among themselves, and also lower performance because duh- instead of the P-cores.

Afaik this has been fixed with a microcode update, latencies should be pretty similar between Zen 4 and Zen 5 products (as in, still bad across CCDs, and good inter-CCD)

Thats because e-core L3 cache is shared amoung the 4 cluster of e-cores as victim cache. Data tends to stay in their a lot.

Intel could also drop unused instruction sets on p-cores, but since its slices of L3 per-p-core insteads of one whole slice shared amoung p-cores there is latncy moving out of the slices of L3 cache. Changing the way they're the P-cores are sittng/aranged on the tile, & centralizing a large single L3 cache sliice would be away to fix it.

At the same time they could just use less L3/L2 cache as arrowlake seems to be bandwisth straved just looking at the CU-DIMM results of 9200mhz it gains a lot more than zen 5 does with high ram speed.

x4it3n · Monday at 10:21 PM

Arrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !

Why_Me · Monday at 11:12 PM

Wolverine2349 said:
Well it has to be at least Raptor Lake gaming level and have a reasonable topology layout rather than the weird screw up it is.

If it had the exact same gaming across the board as Raptor Lake (1% and 0.1% lows included) with reduced power consumption with same topology of P and E cores rather than P cores in the middle of e-core clusters, I would have wanted a 265K. But no it does not sadly.

Raptor Lake is simply a better product than Arrow Lake if it was reliable and did not have degradation issues. If??? the micrcode update truly fixed them for the long haul its easily Raptor Lake over Arrow Lake anyday.

Exactly true. Can the microcode update fix it and truly across the board make it on par or better than RPL??

This is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.

Without.

persondb · Monday at 11:53 PM

bug said:
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

You are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.

x4it3n · Tuesday at 12:02 AM

Why_Me said:
This is where Arrow Lake shines ... CUDIMM and power efficiency.

With CUDIMM.

Without.

CUDIMM will be enabled on some AMD Motherboards with X870(E) chipsets so I'm curious to see how ZEN 5 and ZEN 5 3D will take advantage of it too!

rv8000 · Tuesday at 12:34 AM

x4it3n said:
CUDIMM will be enabled on some AMD Motherboards with X870(E) chipsets so I'm curious to see how ZEN 5 and ZEN 5 3D will take advantage of it too!

Source?

Dawora · Tuesday at 12:53 AM

Ultra series is very overpriced gpu .
better to buy 13/14 series or Amd Cpu

x4it3n said:
Arrow Lake will definitely get better after patches, but catching up with AMD and beat them? Not even in a dream... The 14900K(S) are still better Gaming CPUs and even Hardware Unboxed posted a new video showing that the 9800X3D was 18% faster on Average @ 1080p in Gaming vs 14900K (and most CPU intensive games got around 30%+ more performance on the 9800X3D), so I really doubt Intel could get a 20 to 40% more performance uplift via a Patch !

it depends what GPU ppls use..
14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

Also, not many high end gamers use 1080p so dif is not so big in reality

If we testing 480p we will se bigger differences, or using upcoming 5090 and 240p we see even more bigger differences.
i just want to tell, if test is not close the reality its worthless

x4it3n · Tuesday at 1:28 AM

rv8000 said:
Source?

MSI confirms CU-DIMM DDR5 memory support for Ryzen 9000 and 8000 series - VideoCardz.com

CUDIMM on Ryzen 8000/9000 confirmed Fast memory for Ryzen. CUDIMM is a new memory type featuring a built-in CDK clock driver decoupled from the CPU. This enhances signal integrity and allows memory manufacturers to achieve higher frequencies independently of the memory controller. Users can...

videocardz.com

It is not full support yet, but will probably be with BIOS updates in several months.

Dawora said:
14900K can be much faster VS 9800X3D because Intel user maybe using Rtx4090 and Amd user 7900XtX GPU

14900K much faster? I think you meant 9800X3D right? Because every unbiased IT Channel or Website will tell you that the 9800X3D is currently the best Gaming GPU on the planet!
And I'm pretty sure that once the 5090 is out, the 9800X3D will widen the gap even more!

I personally have a 4K 240Hz monitor with a 4090 so I definitely try to play at 4K as much as possible. If the framerate is not acceptable then I play with DLSS Quality (1440p rendered) and sometimes with Frame Generation like in Black Myth: Wukong or Cyberpunk 2077 w/ Path Tracing...

But even at 4K and depending on games the 1% & 0.1% Lows can be very different depending on which CPU you are using !

unwind-protect · Tuesday at 1:54 AM

bug said:
Arrow Lake's latency may not be ideal. But it's better than Zen 5's: https://chipsandcheese.com/i/152587465/core-to-core-latency

"Lion Cove P-Cores however don’t do so well. Worst case latency between P-Cores can approach cross-CCD latency on AMD’s chiplet designs."

That is quite bad. In a 16-core AMD you can just lock a low-concurrency app such a game into one CCD.

But Arrow Lake has these delays between all P-cores?

Wolverine2349 · Tuesday at 2:21 AM

persondb said:
You are confusing things, latency in general is a generic word. When talk about Arrow Lake latency being bad, they are talking about the system memory latency, i.e. the latency that a core has to access a completely random part of the memory.

What you linked is the core to core latency, which is the latency to synchronize data between cores such as thread synchronization atomics(e.g. mutex, semaphores, etc) or any data really. If a workload, such a gaming, requires a good amount of those then for Zen processors, it would be ideally to put all the threads into a single cluster(what AMD calls a CCD).

For Zen 2, that was an issue because a CCD was a 4-core cluster, so games often wanted to use more than that, but with 8-core clusters for Zen 3 and later that isn't really an issue unless in the rare case of very heavy parallel CPU games like BeamNG.

This is also why a lot of 'optimization' for gaming in Zen can be turning off a CCD/CCX or pin them down and also why often the -950X SKU isn`t better at gaming vs the -700X SKU.

Yes I wonder the same thing,

Of course Arrow Lake latency beyond 8 cores with the whole 24 core die will be better than any Zen 3 to 5 because all cores are on a single tile/die where as AMD beyond 8 cores has to cross CCX/CCD and go through IF and ouch.

But how is Arrow Lake latency to IMC tile and other interconnects like its ring bus compared to AMD Zen 3 to 5 latency from a CCD/CCX t the infinity fabric and such?

Of course Raptor Lake and Alder Lake 10nm monolithic dies to the IMC and ring kick both's butt, but we have no monolithic die that is not 2 years old and oh has degradation issues and unknown how much the microcode update really fixes it long term???

rv8000 · Tuesday at 5:44 AM

x4it3n said:
MSI confirms CU-DIMM DDR5 memory support for Ryzen 9000 and 8000 series - VideoCardz.com

CUDIMM on Ryzen 8000/9000 confirmed Fast memory for Ryzen. CUDIMM is a new memory type featuring a built-in CDK clock driver decoupled from the CPU. This enhances signal integrity and allows memory manufacturers to achieve higher frequencies independently of the memory controller. Users can...

videocardz.com

It is not full support yet, but will probably be with BIOS updates in several months.

Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.

Han44 · Tuesday at 2:06 PM

In the past, people were burned at the stake for Magic (as it is written in the title), and today I use spells (microcode) on CPU, magic is a power in advanced technology

x4it3n · 2024-12-11T01:18:00+0000

rv8000 said:
Yea, so no new information. 9000 series can boot cu-dimms in bypass and nothing else. Mostly no chance we will be seeing actual implementation until zen 6.

CUDIMM is still new so we're years away anyway! But they might make the X870 Motherboards compatible and work as intended when paired with with ZEN 6 CPUs! (which I will probably get since it's still on AM5 and they might have 12c/24t per CCD + Dual V-Cache for dual CCDs).

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Best AMD Computer
Processor	AMD 7900X3D
Motherboard	Asus X670E E Strix
Cooling	In Win SR36
Memory	GSKILL DDR5 32GB 5200 30
Video Card(s)	Sapphire Pulse 7900XT (Watercooled)
Storage	Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s)	GIGABYTE FV43U
Case	Corsair 7000D Airflow
Audio Device(s)	Corsair Void Pro, Logitch Z523 5.1
Power Supply	Deepcool 1000M
Mouse	Logitech g7 gaming mouse
Keyboard	Logitech G510
Software	Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores	Firestrike: 46183 Time Spy: 25121

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

Processor	AMD Ryzen 7 9800X3D (+PBO)
Motherboard	MSI MPG X870E Carbon Wifi
Cooling	ARCTIC Liquid Freezer II 280 A-RGB
Memory	2x32GB (64GB) G.Skill Trident Z Royal @ 6400MHz 1:1 (30-38-38-30)
Video Card(s)	MSI GeForce RTX 4090 SUPRIM Liquid X
Storage	Samsung 990 PRO 2TB w/ Heatsink SSD + Seagate FireCuda 530 SSD 2TB w/ Heatsink
Display(s)	AORUS FO32U2P 4K QD-OLED 240Hz monitor (+ LG OLED C9 55" TV 4K@120Hz)
Case	CoolerMaster H500M (Mesh)
Audio Device(s)	AKG N90Q with AudioQuest DragonFly Red (USB DAC)
Power Supply	Seasonic Prime TX-1600 Noctua Edition (1600W 80Plus Titanium) ATX 3.1 & PCIe 5.1
Mouse	Logitech G PRO X SUPERLIGHT
Keyboard	Razer BlackWidow V3 Pro
Software	Windows 10 64-bit

System Name	Time Killer
Processor	Ryzen 7800x3d
Motherboard	Asus X670E Gene
Cooling	Thermaltake Toughliquid 360 EX Pro ARGB
Memory	G.Skill DDR5, 64 GB, 6000MHz, CL30
Video Card(s)	ASROCK Radeon RX 7900 XTX Taichi OC 24GB
Storage	Crucial T700 1 TB + Kingston Fury Renegate 2 TB
Display(s)	TV Monitor-SamsungQe65q70R. Asus Rog Strix G 713 Qr RTX3070
Case	Phanteks Enthoo Evolv X + small wood mod https://www.techpowerup.com/gallery/4931/wood-mod
Audio Device(s)	Fiio K7 + Sivga sv023 + Fiio fh5s pro + Fiio m11plus / Airpulse A100 + Mogami 2549 z Amphenol 150
Power Supply	Seasonic Vertex PX-1200
Mouse	Rival 650 wireless + Rival Aerox5 Diablo IV edition + Logitech G 502 wireless + Qck hard pad
Keyboard	Corsair K63 Blue Led Wireless + Lapboard
Software	Win10 pro/ 64

Intel 0x114 Microcode Could be the Magic Gaming Performance Fix for "Arrow Lake"

New Member