• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

PlayStation 3 Emulator Delivers Modest Speed-Ups with Disabled E-Cores on Intel Alder Lake Processors

There are application that are faster on the 5800x than on the 5900x because they are affected by that latency.
I'll add to this that there are reports of 5800X splitting the cores over two dies instead of the single one. Not sure if those are true (so much can go wrong if the testing isn't meticulous), but it's a possibility.

Ah, nevermind, the second chiplet is always disabled
 
Post processing AA
Save states
Massively less cable clutter, and room required to house console.
Ability to memory hack games.

Definite advantage,

I personally would not buy a processor with RPCS3 in mind, as the emulator will definitely mature in the future and the team does do targeted optimizations for Ryzen, so it's not like you're missing out here.

But I also have to confess to being the lucky owner of a model CECHA console with full-hardware backwards compatibility (physical EE/GS, 4 USB, card readers), so until RPCS3 reaches about the same level of maturity as PCSX2 has, I can't say i'm too eager to play the little PS3 games I play through it (not to mention most already received PC ports since then), mostly because barring save states, my console can do everything else and more. This console with Rebug firmware and dev kernel installed on it is nothing short of a treat.

I'll add to this that there are reports of 5800X splitting the cores over two dies instead of the single one. Not sure if those are true (so much can go wrong if the testing isn't meticulous), but it's a possibility.

Ah, nevermind, the second chiplet is always disabled

Yeah, they're disabled. Ryzen 7 and below parts have one of the chiplet slots on the packaging completely vacant.
 
Something is badly wrong with a CPU architecture if disabling half the cores results in a performance improvement.
Software, not hardware. The result is not from disabling the cores but from software running on correct cores.
Edit: I might be wrong about that due to AVX512.
Because P-core only enables AVX512, which wasn't very useful outside of several cases and may cause unexpected throttling
AFAIK it does not enable AVX512 automatically. Either way, I am actually impressed that RPCS3 does support AVC512 :)
Not to mention you can play every PS3 game for free using cloud service. They are all locked to 720p 30fps natively anyway. I don't believe someone has a reason to spend their time playing 20 ps3 titles. Maybe the ocasional gem here and there, like Red Dead. You complete it and move on to other games
Cloud service is a whole different ballgame, both in terms of visual quality due to compression artifacts plus huge input lag. On the other side, with emulator like RPCS3 you are not locked to 720p 30fps, far from that.
 
Last edited:
I gave it up, yeah u can run it on 4K and 60 FPS with what sort of CPU 500$?.

I have now 2x PS3 Slim, one on normal FW one modded.
I can still use Games via the modded console if theyr require a Connection to a PS server.
 
Pretty pointless application? You can get a PS3 off Ebay for about £50 and run anything on the native hardware, saving yourself the huge upgrade cost to Alder Lake for this purpose!
Not to mention you can play every PS3 game for free using cloud service. They are all locked to 720p 30fps natively anyway. I don't believe someone has a reason to spend their time playing 20 ps3 titles. Maybe the ocasional gem here and there, like Red Dead. You complete it and move on to other games
As others have pointed out, you two are grossly mistaken. Besides the main purpose of preservation once PS3 hardware is no longer around, emulators also serve the purpose of allowing people to play these games with various improvements, including higher frame rates (including unlocking FPS caps), higher resolutions, alternate control schemes, and while using quality-of-life features such as save states and memory patches ("cheats", though many are more "mods" than "cheats").
I actually bought PS3 Super Slim 500GB this summer. Never had a PS3 before. Some games clearly look amazing. Like Resistance 3 and Killzone 3. However, low resolution prevents them from shining. I played Legend of Zelda:BOTW on CEMU in 4K/60fps and it's a game changer.
Try out Heavenly Sword on RPCS3! It's a blast when it's not running at 12 FPS!

The thing is on Zen 2, communication between CCX had to go thru the I/O die. The infinity fabric could become saturated by all those access and it had to compete with memory and i/o access too. And this round trip to the I/O die was costly on latency and power usage.

On Zen 3, all core within the CCD can communicate directly with each other but still have to go thru the I/O die via infinity fabrics and this have a latency impact. There are application that are faster on the 5800x than on the 5900x because they are affected by that latency. By example

(image removed for brevity)

But those are rare and generally, the higher frequency compensate the latency problem. It's true that the OS should just use the 5950x as a Single CCD but it's harder to implement in real life than in theory. It's more up to the application to establish that.
This is almost, but not quite correct. Zen 2 has two separate 4-core CCXes, each with 16MB of L3 cache, per "Core Complex Die" or CCD. Ergo, a Ryzen 9 3950X has two CCDs, each with two CCXes.

Like in Zen 1 (which did not have a separate I/O die), the CCXes on a single die communicate with each other across the Infinity Fabric interface on the die itself; the signal never goes to the cIOD (the I/O die.)

Otherwise you are correct, though.
More cases like this will not appear.

Where else you would need to mimic instruction sets of a super complex CELL CPU? Also a lot of contributes the raw over 5GHz single core boost. Not only the AVX512. The added performance number corelates more with the added frequency gap.
The testing was done at iso clocks, meaning the two processors were locked to the same clock rate. Also, both processors tested support AVX-512. The difference in the two is simply down to the changes between the Willow Cove core used in the 11900K and the Golden Cove cores in the 12900K.
Actually the emulator is usable, I have played Metal Gear 4 on it. Occasional freezing is more an issue than the lack of CPU power. It is 30FPS limited ingame either way, so what's the fuss?
RPCS3 has the ability to bypass 30 FPS locks in many titles.
 
RPCS3 has the ability to bypass 30 FPS locks in many titles.

I was actually talking about graph below in comments. Some overphilospohy with AMD specific deficiency etc

It is clearly seen the gain from 6 to 8 cores is minimal on both Intel and AMD, you get more just from the core boost within the same arch. 5950X bench shows, that the app totally doesn't know what to do with 16 threads while in gaming. It may during the first code transition phase.

As with any shit code, it likes one fast single thread and then it escalates even further. Praising just one extension that it speeds up that one clearly unoptimized thread is kind like licking your own balls. I understand that Intel Software Development Emulator is nice to use. But it still is a whacky code in the core with very poor multithreading.

It still will need years in development. These news will die in news just as they added the dreaded TSX support that was disabled afterwards in CPU firmware due to few HW bugs. I wonder even why that option even lingers in the emulator.
 
It is clearly seen the gain from 6 to 8 cores is minimal on both Intel and AMD, you get more just from the core boost within the same arch. 5950X bench shows, that the app totally doesn't know what to do with 16 threads while in gaming. It may during the first code transition phase.

As with any shit code, it likes one fast single thread and then it escalates even further. Praising just one extension that it speeds up that one clearly unoptimized thread is kind like licking your own balls. I understand that Intel Software Development Emulator is nice to use. But it still is a whacky code in the core with very poor multithreading.
Multithreading isn't magic. You can't make something that can't be parallelized faster by throwing more threads at it. Learn about Amdahl's law.
It still will need years in development. These news will die in news just as they added the dreaded TSX support that was disabled afterwards in CPU firmware due to few HW bugs. I wonder even why that option even lingers in the emulator.
And the option is there because it works fine on processors with functional TSX, and it gives a big speed-up.
 
It is clearly seen the gain from 6 to 8 cores is minimal on both Intel and AMD, you get more just from the core boost within the same arch. 5950X bench shows, that the app totally doesn't know what to do with 16 threads while in gaming. It may during the first code transition phase.

As with any shit code, it likes one fast single thread and then it escalates even further. Praising just one extension that it speeds up that one clearly unoptimized thread is kind like licking your own balls.
This is a PS3 emulator. PS3 CPU has 1 PPE thread and 6 SPE threads (plus one for security and internal stuff). 7 threads if the game developer has done its job well.
That's basically an SIMD test. 2500 as a desktop processor only has 1/3 performance of 7700HQ because the former lacks AVX2 support.
2500 has 1/3 performance because it only has 4 threads. 7700HQ has 8. When a game uses more threads the slowdown is going to be huge. And RDR is one of these games.
 
Last edited:
as it's the best way to scale up multicore CPU performance.

This is unequivocally wrong, it's the worst way to scale multicore performance.

You can see this for example in CPU-Z where 12900K achieves about 13X scaling and the 5950X achieves about 18X.

The 12900K is a "16 core" CPU just like the 5950X yet it can't match it's multicore scaling, it's not even close and all of this while the 5950X is also a lot more power efficient. Are you sure it isn't you who is ignorant here ?
 
Last edited:
Mhm! So advanced that one needs to press Scroll Lock to disable half of their CPU. I humbly bow down! :roll:

I mean, that's precisely why. However, it's not Intel at fault here. Alder Lake actually has some amazing state of the art technology, that hardware scheduler they call the "Intel thread director" is, imo, hands down the best improvement a x86 processor has seen in quite some time. The truth is that Windows hopelessly relies on decades-old legacy code that nobody working at Microsoft currently understands or can do anything about, either because of the OS being a Jenga tower that directly relies on that by its very design, or because of legal/patent issues...

Here, right click your desktop, create a new text document and try naming it "COM1" or "LPT1", and you'll see what I mean. I could even go a step further, it's not only remnants from the DOS days four decades plus past, it still contains the dialer application from the NT 3 days in it and all of the surrounding cruft that makes it work, why on Earth does Windows 11 need this?

dialer.png


My point is that Windows is long since past its prime, and no amount of makeover Microsoft ever does to it is gonna change that. Since Windows 8's release, Microsoft's primary focus seems to have been keeping Windows' rotting corpse as neatly embalmed and dressed as possible, but major hardware design changes like this bring the nastiness outside. Eventually they'll have rewritten enough of the kernel and OS's low level functions that such a design will work, but who knows? If you have to disable half of your cores for your operating system to simply behave, something's wrong with it, and we all know what it is, we've just been telling ourselves otherwise over sheer convenience, to be frank.
 
Last edited:
This is a PS3 emulator. PS3 CPU has 1 PPE thread and 6 SPE threads (plus one for security and internal stuff). 7 threads if the game developer has done its job well.
2500 has 1/3 performance because it only has 4 threads. 7700HQ has 8. When a game uses more threads the slowdown is going to be huge. And RDR is one of these games.
No, hyper-threading doesn't bring in real cores, just make shared resources being utilized more efficiently. Even in best case scenario (similarcompute workload perfectly scale with thread and not memory/cache bandwidth bond), the performance gain is usually about 30% vs hyper threading disabled. In most games hyperthreading actully has negative impact because of introduced context switching...
AVX2 is clearly the deciding factor here.
 
I mean, that's precisely why. However, it's not Intel at fault here. Alder Lake actually has some amazing state of the art technology, that hardware scheduler they call the "Intel thread director" is, imo, hands down the best improvement a x86 processor has seen in quite some time. The truth is that Windows hopelessly relies on decades-old legacy code that nobody working at Microsoft currently understands or can do anything about, either because of the OS being a Jenga tower that directly relies on that by its very design, or because of legal/patent issues...

Here, right click your desktop, create a new text document and try naming it "COM1" or "LPT1", and you'll see what I mean. I could even go a step further, it's not only remnants from the DOS days four decades plus past, it still contains the dialer application from the NT 3 days in it and all of the surrounding cruft that makes it work, why on Earth does Windows 11 need this?

View attachment 226439

My point is that Windows is long since past its prime, and no amount of makeover Microsoft ever does to it is gonna change that. Since Windows 8's release, Microsoft's primary focus seems to have been keeping Windows' rotting corpse as neatly embalmed and dressed as possible, but major hardware design changes like this bring the nastiness outside. Eventually they'll have rewritten enough of the kernel and OS's low level functions that such a design will work, but who knows? If you have to disable half of your cores for your operating system to simply behave, something's wrong with it, and we all know what it is, we've just been telling ourselves otherwise over sheer convenience, to be frank.
Yes, we are on the same page, it seems.

It was my take on intel, as they have their fair share of BS to this day.

M$ should build a modern OS from the ground up, instead re-skinning the same old PoS each year, asking for even higher prices.
 
Multithreading isn't magic. You can't make something that can't be parallelized faster by throwing more threads at it. Learn about Amdahl's law.

And the option is there because it works fine on processors with functional TSX, and it gives a big speed-up.

Show me those functional CPUs...

Everything from Haswell to Kaby Lake is disabled in microcode. Later ones doesn't have the set as such.

Call me up when the emulator won't choke on one single thread while ingame... LLVM limitations are the key factor of shit multithreading here, not Amdahl's law, no matter how you try to defend it. Instead of relying on brute force AVX512, but instead aiding GPGPU/OpenCL for aiding in complex instruction sets.

It totally nuts to read people about needing a rare AVX512 that nobody does use in home/gaming scenarios. For professional use you use different caliber of gear with included ECC RAM if you wish for serious calculations and not fooling around.

Learn.

In August 2014, Intel announced that a bug exists in the TSX/TSX-NI implementation on Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update.[9][10][23] The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014.[24]

The bug was found and then reported during a diploma thesis in the School of Electrical and Computer Engineering of the National Technical University of Athens.[25]

In October 2018, Intel disclosed a TSX/TSX-NI memory ordering issue found in Skylake processors.[26] As a result of a microcode update, HLE support was disabled in the affected CPUs, and RTM transactions would always abort in SGX and SMM modes of operation. System software would have to implement a workaround for the RTM memory ordering issue. In June 2021, Intel published a microcode update that further disables TSX/TSX-NI on various Xeon and Core processor models from Skylake through Coffee Lake and Whiskey Lake as a mitigation for unreliable behavior of a performance counter in the Performance Monitoring Unit (PMU).[27] By default, with the updated microcode, the processor would still indicate support for RTM but would always abort the transaction. System software is able to detect this mode of operation and mask support for TSX/TSX-NI from the CPUID instruction, preventing detection of TSX/TSX-NI by applications. System software may also enable the "Unsupported Software Development Mode", where RTM is fully active, but in this case RTM usage may be subject to the issues described earlier, and therefore this mode should not be enabled on production systems.

According to Intel 64 and IA-32 Architectures Optimization Reference Manual from May 2020, Volume 1, Chapter 2.5 Intel Instruction Set Architecture And Features Removed,[18] HLE has been removed from Intel products released in 2019 and later. RTM is not documented as removed. However, Intel 10th generation Comet Lake and Ice Lake CPUs, which were released in 2020, do not support TSX/TSX-NI,[28][29][30][31][32] including both HLE and RTM.

In Intel Architecture Instruction Set Extensions Programming Reference revision 41 from October 2020,[33] a new TSXLDTRK instruction set extension was documented and slated for inclusion in the upcoming Sapphire Rapids processors.
 
This is what ive been waiting for to see, hope others would use RPCS3 as a CPU benchmark like they did before with dolphin


RPCS3 is heavily AVX accelerated, which is cool for the usecase, but AVX has little relevancy in most use cases, and AVX512 even more so.
 
Back
Top