Thursday, August 10th 2023
Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores
Action RPG "Atlas Fallen" joins a long line of RPGs this Summer for you to grind into—Baldur's Gate 3, Diablo 4, and Starfield. We've been testing the game for our GPU performance article, and found something interesting—the game isn't optimized for Intel Hybrid processors, such as the Core i9-13900K "Raptor Lake" in our bench. The game scales across all CPU cores—which is normally a good thing—until we realize that not only does it saturate all of the 8 P-cores, but also the 16 E-cores. It ends up with under 80 FPS in busy gameplay at 1080p with a GeForce RTX 4090. Performance is "restored" only when the E-cores are disabled.
Normally, when a game saturates all of the E-cores, we don't interpret it as the game being "aware" of E-cores, but rather "unaware" of them. An ideal Hybrid-aware game should saturate the P-cores for its main workload, and use the E-cores for errands such as processing the audio stack (DSPs from the game), network stack (the game's unique multiplayer network component), physics, in-flight decompression of assets from the disk, etc., which show up in Task Manager as intermittent, irregular load. "Atlas Fallen" appears to be using the E-cores for its main worker threads, and this is found imposing a performance penalty as we found out by disabling the E-cores. This performance penalty is because the E-cores run slower than P-cores, at lower clock speeds, have much lower IPC, and are cache-starved. Frame data being processed by the P-cores end up having to wait for those from the E-cores, which causes the overall framerate to come down.In the Task Manager screenshot above, the game is running in the foreground, we set Task Manager to be "always on top," so Thread Director won't interfere with the game. It prefers to allocate the P-cores to foreground tasks, which doesn't happen here, because the developers chose to specifically put work on the E-Cores.
For comparison we took four screenshots, with E-Cores enabled and disabled (through BIOS). We picked a "typical average" scene instead of a worst case, which is why the FPS are a bit higher. As you can see, with E-Cores enabled are pretty low (136 / 152 FPS), whereas turning off the E-Cores instantly increases performance right up to the engine's internal FPS cap (187 / 197 FPS).
With the E-cores disabled, the game is confined to what is essentially an 8-core/16-thread processor with just P-cores, which boost well above the 5.00 GHz mark, and have the full 36 MB slab of L3 cache to themselves. The framerate now shoots up to 200 FPS, which is a hard framerate limit set by the developer. Our RTX 4090 should be capable of higher framerates, and developers Deck13 Interactive should consider raising it, given that monitor refresh-rates are on the rise, and it's fairly easy to find a 240 Hz or 360 Hz monitor in the high-end segment. The game is based on the Fledge engine, and supports both DirectX 12 and Vulkan APIs. We used GeForce 536.99 WHQL in our testing. Be sure to check out our full performance review of Atlas Fallen later today.
Normally, when a game saturates all of the E-cores, we don't interpret it as the game being "aware" of E-cores, but rather "unaware" of them. An ideal Hybrid-aware game should saturate the P-cores for its main workload, and use the E-cores for errands such as processing the audio stack (DSPs from the game), network stack (the game's unique multiplayer network component), physics, in-flight decompression of assets from the disk, etc., which show up in Task Manager as intermittent, irregular load. "Atlas Fallen" appears to be using the E-cores for its main worker threads, and this is found imposing a performance penalty as we found out by disabling the E-cores. This performance penalty is because the E-cores run slower than P-cores, at lower clock speeds, have much lower IPC, and are cache-starved. Frame data being processed by the P-cores end up having to wait for those from the E-cores, which causes the overall framerate to come down.In the Task Manager screenshot above, the game is running in the foreground, we set Task Manager to be "always on top," so Thread Director won't interfere with the game. It prefers to allocate the P-cores to foreground tasks, which doesn't happen here, because the developers chose to specifically put work on the E-Cores.
For comparison we took four screenshots, with E-Cores enabled and disabled (through BIOS). We picked a "typical average" scene instead of a worst case, which is why the FPS are a bit higher. As you can see, with E-Cores enabled are pretty low (136 / 152 FPS), whereas turning off the E-Cores instantly increases performance right up to the engine's internal FPS cap (187 / 197 FPS).
With the E-cores disabled, the game is confined to what is essentially an 8-core/16-thread processor with just P-cores, which boost well above the 5.00 GHz mark, and have the full 36 MB slab of L3 cache to themselves. The framerate now shoots up to 200 FPS, which is a hard framerate limit set by the developer. Our RTX 4090 should be capable of higher framerates, and developers Deck13 Interactive should consider raising it, given that monitor refresh-rates are on the rise, and it's fairly easy to find a 240 Hz or 360 Hz monitor in the high-end segment. The game is based on the Fledge engine, and supports both DirectX 12 and Vulkan APIs. We used GeForce 536.99 WHQL in our testing. Be sure to check out our full performance review of Atlas Fallen later today.
120 Comments on Atlas Fallen Optimization Fail: Gain 50% Additional Performance by Turning off the E-cores
But I guess you are so blindfolded when things comes to fanboyism.
So its swings and roundabouts really.
Since it is whole system power,
Let's take the idle consumption away (~50W)
Then it is 129W vs 247W (+91.4%)
Then, in the previously mentioned 7950X3D review
It is already calculated as CPU-only
So it is 140w vs 276w (+97.1%)
Apparently you lack reading skills, in addition to knowledge Keep using the 7950X3D which is slower than 13900K in multithreaded workloads…
The 7950X3D is a tuned down 7950X, so if you “tune” a 7950X or a 13900K you can reduce the gap. Not saying Intel CPUs are power efficient (they are not !), but presenting the number the way you are doing is misleading
And TPU didn't do a lot of efficiency tests but here you can see that 12x series processors have efficiency close to 5x series. The 12900K actually has a poor showing and the 5950X a great showing, probably because the 12900K was designed to be thermally unconstrained, whereas the 5950X can't handle the heat its cores produce at full load, so it clocks down, and desktop processors are way more efficient at lower clock speeds.
And about your 7950X3D vs 13900k arguement.
Yes 7950X3D is slower then 13900k in multithreaded workloads..by 2% according to Techpowerup's Application performance summary, that's fact.
But it is also Yes it is 140W vs 276W, which is also fact.
While your claim was "The 7950X3D is a tuned down 7950X, so if you “tune” a 7950X or a 13900K you can reduce the gap"
I must tell you that Intel did tune down their 13900k and release the 13900T.
And the 13900T is a very rare CPU and almost no reviews existed.
You might not like it.
But we all know that it is not looking good when Intel tried to do the same. Yes it outperformances the 5950x by 7.6% , while consuming 91% more power.
12900k is faster, but also comsumes a lot more.
And about your "12x series processors have efficiency close to 5x series" arguement.
I don't see where is that came from.
Your quoted picture clearly shows
The best 12th gen sample was 9.6kj
The best 5000 series sample was 6.4kj
Which, the 5000 series had a 50% efficiency advantage.
And the 12900k isn't the most efficient 12th gen either...
And about the"thermally unconstrained" arguement.
"Hey my CPU can eat more electricity" is a good point in your mindset?
Okay I get that as a personal preference thing.
If you are judging CPUs on how many energy they can take before it overheats, I won't argue with you on that.
Intel had their previously very famous 7980XE which can take >1000W even.
I guess that's you favourite CPU.
I don't judge on these personal preference choices.
Just note that many of us do take the power bills very seriously at the times
And energy concern is very real.
The 5800X was 10.0kJ
The 12900K was 10.2kJ
Was the 5800X unpopular on account of its Intel-like inefficiency? Why did the 12900K perform less efficiently with the E-cores disabled?
It is like the worst student in the red class scored 50 and everyone else scored better.
but in the blue class, everyone scored 50.
The 5800x doesn't have good efficiency mainly because of the unnecessary high power target and bad voltage tuning.
The OEM 5800 non-X proved that by retaining most of the performance with insane efficiency.
As for the E-cores disabled 12900k
It is the same as 5800x, unnecessary high power target and bad voltage tuning.
Basically the full power target was given to the 8 P-cores and they went too far out of the efficiency window.
Actually the E-cores do hurt efficiency in a way; because their maximum power consumption is fairly low, they don't eat into the 297W power limit very much, so they don't force the P-cores to clock down very much under full load. More E-cores at the same power target or a lower power target would help with this.
TPU improved the measurement to CPU-only power draw later in the 13900k / 7950X3D tests.
So to counting the power draw in the 12900k test, the idle power should be subtracted.
So it was 129 vs 247, a much larger difference.
The 12th gen E-cores aren't particularly power efficient, as already being tested on TPU's E-cores only test.
12th gen E-cores are more like space-efficient, so to put 8 physical cores into an area roughly equal to 3 P-cores.
13th gen E-cores are the other way around, they are much more power efficient, but the nature of having more E-cores in 13th gen CPUs just cancelled out that benefit.
As for a reduced power target 12900k.
Some reviewers turned the MCE off so to test the 12900k at the "Default" power then run some tests that exceeds the PL2 time limit.
Let's say 12900k power scale very well and it actually needs the power.
Without some undervolting a reduced power target 12900k would perform a bit less than excpected.
I'm not arguing that the 12th-gen Intel processors are exactly as efficient nor that Intel tuned the K-series models with efficiency in mind. I'm arguing that the 12th-gen Intel processors have similar efficiency to the 5000-series AMD processors, and that the E-cores are not inherently bad for efficiency.
So its not DDR4 price vs DDR5 price its zero vs DDR5 price.
Now if one already didnt have DDR4 then its just the cheaper boards, but most people choosing to go DDR4 on 12/13 gen very likely already have DDR4, as thats the only time it makes sense.