Tuesday, November 23rd 2021
PlayStation 3 Emulator Delivers Modest Speed-Ups with Disabled E-Cores on Intel Alder Lake Processors
According to some testing performed by the team behind RPCS3, a free and open-source emulation software for Sony's PlayStation 3, Intel's Alder Lake processors are enjoying a hefty performance boost when E-Cores is disabled. First of all, the Alder Lake processors feature a hybrid configuration with high-performance P-cores and low-power E-cores. The P-cores are based on Golden Cove architecture and can execute AVX-512 instructions with ease. However, the AVX-512 boost is only applicable when E-cores are disabled as software looks at the whole package. Officially, Alder Lake processors don't support AVX-512, as the processor's little E-cores cannot execute AVX-512 instruction.
Thanks to the team behind the RPCS3 emulator, we have some information and tests that suggest that turning E-cores off gives a performance boost to the emulation speed and game FPS. With E-Cores disabled, and only P-cores left, the processor can execute AVX-512 and gets a higher ring ratio. This means that latency in the ring bus is presumably lower. The team benchmarked Intel Core i9-12900K, and Core i9-11900K processors clocked at 5.2 GHz for tests. The Alder Lake chip had disabled E-cores. In God of War: Ascension, the Rocket Lake processor produced 68 FPS, while Alder Lake produced 78 FPS, representing around 15% improvement.This suggests that more applications can take advantage of disabling E-cores, especially if the application has support for AVX-512 instructions, where only P-cores can execute them. So it remains to be seen throughout trial and error if more cases like this appear.
Source:
RPCS3
Thanks to the team behind the RPCS3 emulator, we have some information and tests that suggest that turning E-cores off gives a performance boost to the emulation speed and game FPS. With E-Cores disabled, and only P-cores left, the processor can execute AVX-512 and gets a higher ring ratio. This means that latency in the ring bus is presumably lower. The team benchmarked Intel Core i9-12900K, and Core i9-11900K processors clocked at 5.2 GHz for tests. The Alder Lake chip had disabled E-cores. In God of War: Ascension, the Rocket Lake processor produced 68 FPS, while Alder Lake produced 78 FPS, representing around 15% improvement.This suggests that more applications can take advantage of disabling E-cores, especially if the application has support for AVX-512 instructions, where only P-cores can execute them. So it remains to be seen throughout trial and error if more cases like this appear.
39 Comments on PlayStation 3 Emulator Delivers Modest Speed-Ups with Disabled E-Cores on Intel Alder Lake Processors
sample test of RPCS3 running Red dead redemption on several CPUs
Say what you want about unoptimized software allegedly being the issue, but the bottom line is that we have 16 core CPU with 8 low performance cores rather than the full complement of 16 performance cores as it should be. I really don't like this hybrid design and feel that the consumer (us) is getting cheated out of a lot of performance.
AMD really need to come back with Alder Lake beating performance with all cores being performance cores, or this situation will continue.
If software wasn't an issue you'd want a CPU that is entirely made up of E-cores as they're just more efficient in die space so you always get more performance. Unfortunately a lot of software just doesn't scale well so there's a benefit to use big cores but they're just not a cost effective use of your silicon. That's also why all upcoming known Intel architectures keep 8 P-cores and scale up the E-cores. The software that P-cores are designed for don't really use more than 8 cores anyway atm.
This discussion is just a repeat of the whole single core vs multicore CPUs. Back then we also sacrificied single core performance for the sake of having more cores. Conroe just had the benefit of being a massive increase in performance.
Also to adress your first sentence: In games and software with limited scaling disabling HT usually also leads to a performance increase. Or if you disable a CCD on a 5950x you generally also gain performance in software that doesn't scale, like games.
I bet the problem is they couldn't make it without investing too much transistors or crippling too much the performance. A downside of this approach is that these emulator aren't your typical AVX512 load. Generally those load are all cores fully multithreaded.
I sympathize in general with qubit's line of thought (and why I plan on buying the 3D upgrade for my processor), but I also see the appeal in Alder Lake's hybrid design and where Intel wants to go with it... I just think the Windows ecosystem is not exactly ready for such advanced technology yet. A few friends of mine have upgraded to the i7-12700K and seem to be very pleased with the result, they do pull their own weight for gaming, although something that struck me as odd is that there are games that answer better to being run on the big cores and others that run better on the little cores, meaning that it isn't the big cores that always invariably win the race. That leads to some inconsistency, and to be really frank, my 5950X is plenty fast as it is, I'm only upgrading because of some accounting magic, I give this to my brother, upgrade, sell the 3900XT I left with him last year, 2 people upgrading for less than 1 CPU's full price... everyone wins, that's the idea at least. Hopefully the pricing will be sensible, or even that will not be worth it.
While the future of CPUs is looking good, it's clear this new architecture has to mature first.
Probably won't take longer than a year or two, knowing MS and its strategic outlook :D Sure, but you can't emulate on a PS3, its slow as shit for some games (Heavenly Sword could easily run sub 20 FPS on a ps3, and its no exception), hot and loud for others, you have storage media limitations or a failing BR lens, PS Network is no real added use anymore, should I go on?
And let's not begin about the content itself. Its not like they get released any longer.
Even emulating PS2 on a PC is 10x better than the OG console. Even if only just for save states.
Also, RPCS3 is alpha-level software. You don't want to benchmark with that. There's a limited number of PS3 consoles in the world. And they're dwindling every day as they fail or break. Never mind that their current owners might not be willing to let go of them for some time yet.
RPCS3 was always particularly Intel-friendly, but that's because the emulator's sensitive to a few things that Intel chips currently do better and that their CPUs have historically had a bit of an advantage as far as instruction sets go. This emulator in particular has pioneered use of all of these instructions, like TSX, 256-bit and 512-bit AVX, etc, I wouldn't fault any of its developers for preferring Intel for their development machines. I applaud their use of pioneering instruction sets, even if unsupported by many modern CPU microarchitectures.
The lukewarm reaction in this thread is probably expected, RPCS3 is hardly representative of any real-world or meaningful advantage of the Intel architecture vs. the AMD one, as it has always traditionally been Intel-biased. It's not a bad thing, there are other places where Ryzen will shine particularly bright, as well. :)
On Zen 3, all core within the CCD can communicate directly with each other but still have to go thru the I/O die via infinity fabrics and this have a latency impact. There are application that are faster on the 5800x than on the 5900x because they are affected by that latency. By example
But those are rare and generally, the higher frequency compensate the latency problem. It's true that the OS should just use the 5950x as a Single CCD but it's harder to implement in real life than in theory. It's more up to the application to establish that.
CCD to CCD isn't much faster than memory access so it won't really help there. What AMD could do with a larger interposer is to add Infinity fabrics link between CCD. This should cut the CCD to CCD latency by half at least.
As for GPU, again it will depend of the kind of code it will run. if it has to do a lot of sync, that will not be beneficial to have 2 CPU on the same die instead of 1 big. If all data is very contain and have a very high level of parallelism, it won't matter much. (like Zen 2 still do very great on video encoding, 3d rendering etc.)
Save states
Massively less cable clutter, and room required to house console.
Ability to memory hack games.
Definite advantage,
Where else you would need to mimic instruction sets of a super complex CELL CPU? Also a lot of contributes the raw over 5GHz single core boost. Not only the AVX512. The added performance number corelates more with the added frequency gap.
Actually the emulator is usable, I have played Metal Gear 4 on it. Occasional freezing is more an issue than the lack of CPU power. It is 30FPS limited ingame either way, so what's the fuss?
The LLVM needs a lot of job... and it still has poor multithreading, they experiment a lot in certain way, but it lacks the desired result often.
I'll add to this that there are reports of 5800X splitting the cores over two dies instead of the single one. Not sure if those are true (so much can go wrong if the testing isn't meticulous), but it's a possibility.Ah, nevermind, the second chiplet is always disabled