Contrary to expectations(but some of us have suspected) ARC's issues with being unable to pull of performance corresponding to the x70 die size and compatibility issues with some games have to do with Alchemist's hardware, not software.
Battlemage revelations by Tom "TAP" Peterson at Computex:
It's also pointed out that compatibility issues lie partly due to using SIMD8 for base width, and moving to SIMD16 will improve compatibility with games without needing hand-tuning by the driver team.
Hardware has been bottlenecking ARC
NOT
Driver has been bottlenecking ARC
The driver team would have needed to hand-tune each game to avoid the weaknesses in the architecture such as those requiring Execute Indirect. It'll no longer need Day 1 driver for each and every game to fix compatibility issues with games needing SIMD16. This means not only driver development will be FASTER and more effective, but in many cases won't need intervention at all, which will be a boon for especially older titles.
Things such as Fast Clear is also funny as it has been in AMD/Nvidia architectures for more than a decade now. Welcome to the modern world ARC. Chips and cheese talked about how Alchemist requires high workload to take full advantage of the shaders and even memory subsystem. 512GB/s on Alchemist is not really 512GB/s because of architectural design choices.
Bonus:
Battlemage revelations by Tom "TAP" Peterson at Computex:
The Xe-cores in Xe2 have been improved for higher performance, better utilisation, and greater compatibility with games. That last point is particularly important, going off Intel's previous form.
These changes take various forms, though I'm told it's not only improvements to the software stack, but changes to the silicon itself to make it gel more easily with modern games.
There's hardware support for commonly used commands, such as execute indirect, which causes headaches and slows performance on Alchemist. Another command, Fast Clear, is now supported in the Xe2 hardware, rather than having to be emulated in software as it was on Alchemist.
Another is execute indirect support baked into the hardware, via the Command Front End, which is a command used commonly in game engines, including Unreal Engine 5. This was previously emulated in software on Alchemist, which led to slowdown
Execute Indirect not being in hardware is why games like Hellblade 2, Remnant 2 and Nightingale all underperform on ARC, because Alchemist emulates it in software, an instruction that UE5 uses. Battlemage makes Execute Indirect 12.5x faster!The Xe2 architecture's Render Slice includes improvements to deliver 3x mesh shading performance, 3x vertex fetch throughput, and 2x throughput for sampling without filtering. Bandwidth requirements should be lower, and commands are more in line with what games often use.
It's also pointed out that compatibility issues lie partly due to using SIMD8 for base width, and moving to SIMD16 will improve compatibility with games without needing hand-tuning by the driver team.
Hardware has been bottlenecking ARC
NOT
Driver has been bottlenecking ARC
The driver team would have needed to hand-tune each game to avoid the weaknesses in the architecture such as those requiring Execute Indirect. It'll no longer need Day 1 driver for each and every game to fix compatibility issues with games needing SIMD16. This means not only driver development will be FASTER and more effective, but in many cases won't need intervention at all, which will be a boon for especially older titles.
Things such as Fast Clear is also funny as it has been in AMD/Nvidia architectures for more than a decade now. Welcome to the modern world ARC. Chips and cheese talked about how Alchemist requires high workload to take full advantage of the shaders and even memory subsystem. 512GB/s on Alchemist is not really 512GB/s because of architectural design choices.
Bonus:
More capable RT unit means less performance loss when RT is on.-Ray Tracing Unit width increases from 2 traversal pipelines to 3.