• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

We found the Missing Performance: Zen 5 Tested with SMT Disabled

Thanks @W1zzard the amount of work you put into this is amazing.

Hats off to you!

Awesome content as always.
 
Disabling SMT to increase performance in the gaming community has been around for a long time and something I've personally done for years, however it never was this big of a gap. Disabling SMT will almost always reduce microstuttering/stuttering at the loss of a bit of average FPS. The most important distinction here however is W11. The thread scheduler in W11 was the biggest change from W10 and while Intel said you had to swap to W11 to utilize their CPUs properly, they work under W10 too.

While In a lot of cases loading up the 'best' cores the most helps for overall throughput, it's not a best case scenario for latency sensitive applications. Games are very much latency sensitive as well as throughput sensitive.

Taking this further, you'll actually find that having CPPC on also reduces performance. While it might help with one specific application that is utilizing one core, windows will over load the 'best' cores further exasperating this situation. While you can't turn CPPC off anymore in modern Ryzens specifically, you can by disabling other power management options which has pros/cons to it. CPPC is under a power management feature called PSS Support (which is a few features, but CPPC is what we want).

After disabling CPPC you'll see thread loading to be more uniform across all cores. This is important for multithreaded and even single threaded applications. So a bunch of extra threads aren't being thrown on a CPU that might be all of 5% faster, while other cores have plenty of headroom available.

CPPC/SMT/W11 thread scheduler are all things at work here and need to be tweaked.

Also going to point out using Process Lasso or other applications to 'band aid' a game to every other core is NOT the same thing as natively disabling SMT. Windows will still try to utilize the other half of the virtual core creating thread contention. There are a lot of windows processes that can't be touched, if people using lasso even bother with everything that can outside of the game.
 
I use a 5800X3D and I have SMT disabled, because Elden Ring has stutter and disabeling it reduces those by a lot.
 
These processors are clearly still a work in process.
 
I didn't know those *X3D CPUs are so good for gaming. Where is the 9800X3D?
 
Disabling SMT to increase performance in the gaming community has been around for a long time and something I've personally done for years, however it never was this big of a gap. Disabling SMT will almost always reduce microstuttering/stuttering at the loss of a bit of average FPS. The most important distinction here however is W11. The thread scheduler in W11 was the biggest change from W10 and while Intel said you had to swap to W11 to utilize their CPUs properly, they work under W10 too.

While In a lot of cases loading up the 'best' cores the most helps for overall throughput, it's not a best case scenario for latency sensitive applications. Games are very much latency sensitive as well as throughput sensitive.

Taking this further, you'll actually find that having CPPC on also reduces performance. While it might help with one specific application that is utilizing one core, windows will over load the 'best' cores further exasperating this situation. While you can't turn CPPC off anymore in modern Ryzens specifically, you can by disabling other power management options which has pros/cons to it. CPPC is under a power management feature called PSS Support (which is a few features, but CPPC is what we want).

After disabling CPPC you'll see thread loading to be more uniform across all cores. This is important for multithreaded and even single threaded applications. So a bunch of extra threads aren't being thrown on a CPU that might be all of 5% faster, while other cores have plenty of headroom available.

CPPC/SMT/W11 thread scheduler are all things at work here and need to be tweaked.

Also going to point out using Process Lasso or other applications to 'band aid' a game to every other core is NOT the same thing as natively disabling SMT. Windows will still try to utilize the other half of the virtual core creating thread contention. There are a lot of windows processes that can't be touched, if people using lasso even bother with everything that can outside of the game.


Actually the problem here is that if you are inclined to disable SMT on a 9700X to get it up to 14700K levels of gaming performance and thus cripple its anemic productivity performance even more, then you didn't care about productivity performance in the first place and prize gaming performance above all.

And that just means you should have bought a 7800X3D or 7950X3D in the first place. They are crippled in productivity as well, but nothing matches them for gaming.

None of this helps the case for the 9700X/9600X. They just don't have a place in the market IMO.
 
I've never let windows schedule my cpus it's dumb as shite....

Been using process lasso for a while
On my 7950X3D these are my profiles

1 with cache only CCD for games
2 without smt on the cache ccd for games
3 for background process on the secondary ccd
4 for things that benefit from all core workloads.

Once setup it takes all of 1m to set a profile to a particular program.

Should anyone have to do that no but if you want the most out of your hardware it's just the reality.
 
I've never let windows schedule my cpus it's dumb as shite....

Been using process lasso for a while
On my 7950X3D these are my profiles

1 with cache only CCD for games
2 without smt on the cache ccd for games
3 for background process on the secondary ccd
4 for things that benefit from all core workloads.

Once setup it takes all of 1m to set a profile to a particular program.

Should anyone have to do that no but if you want the most out of your hardware it's just the reality.
You shouldn't, but let's face it, now that we're in the heterogeneous CPU world, more effort probably needs to go into this part of the tweaking process. The way CPUs boost now and essentially overclock themselves, what you're doing sounds like more the thing enthusiasts should be considering to get the best performance. Maybe someday, devs will work this "preferred CPU setup" into their products, but there's a lot of CPU variation to plan for, and it's probably not worth the trouble for the small gains most will get out it.
 
AMD needs the chipset driver magic once again for scheduling games properly..lol..
 
Games run better with SMT disabled? What!? This does not compute! :eek:

SMT uses power, more power budget with 65/88w, higher clocks, better performance?

I disable HTT on my main gaming power schema (its actually off on most of my schemas now), as games usually run better and I save power/heat. Wizzards findings dont surprise me. But I do have one gaming schema that allows HTT cores to be scheduled, so I dont need to reboot to toggle, I can do it on the fly.

The hay day for this type of tech on gaming was on quad core CPU's for 8 threaded games.

I have observed it can still help though, it helps on 3dmark CPU parts of tests, but I havent seen a game that performs better with it on yet.
 
Last edited:
SMT uses power, more power budget with 65/88w, higher clocks, better performance?
It does not explain the results with CPUs like the 7800X3D which runs way below its power target even in multi-threaded work.
 
I guess what I'm asking is... Why aren't we seeing the double-digit performance increases that AMD purported in many of their leaked slides? What's going on here? Is it power? Thermals? Or something else?
It's not power or thermals, these are eliminated by the "PBO max" run, also by the PBO max run in the regular review, which is SMT on. I didn't want to add yet another colored bar. Disabling SMT shows surprisingly good gains, bigger than on Zen 4, which is the basis for this article. I'm not claiming I have a fix, and blindly turning SMT off is definitely not the fix. I'm saying "here's some unexpected results, AMD and the community, please look into it"
 
I've never let windows schedule my cpus it's dumb as shite....

Been using process lasso for a while
On my 7950X3D these are my profiles

1 with cache only CCD for games
2 without smt on the cache ccd for games
3 for background process on the secondary ccd
4 for things that benefit from all core workloads.

Once setup it takes all of 1m to set a profile to a particular program.

Should anyone have to do that no but if you want the most out of your hardware it's just the reality.
I do similar but with the free process hacker (system informer).

Here is my notes what I have discovered with CPU scheduling on Windows (10).

  1. Power schema settings control/manipulate how Windows will schedule threads. However this only applies when 'no' affinity is configured, this just doesnt mean from the admin of the machine, but software devs themselves can code in affinity in to their software.
  2. If I add a affinity in process hacker to allow use all cores (same as default in theory), it actually changes the scheduling behaviour from having no affinity configured at all, thats interesting.
  3. Any affinity configured will override the windows scheduling behaviour, so e.g. if you set the power schema settings to park half of the cores and then run a cinebench all core benchmark, it will ignore the schema and use all cores.
  4. Windows core parking is only a soft park, it doesnt actually hardware (C6) park a core, thats handled independently, soft park just will prevent threads been scheduled to the core, but of course if affinity forces use of the core, the software parking is overridden.
  5. A process that launches child processes, the child processes will by default inherit the affinity from the parent, but can be overridden with their own affinity using something like process hacker or process lasso.
  6. Back to the power schema, it isnt possible to park specific cores of your choosing, instead Windows has a kind of priority system. On my 13700k. The 2 highest priority cores are the first logical core of the 2 high clocking cores. (this only applies if C6 state is enabled, without it these cores dont clock higher). This will of course be very likely the same for AMD as that also has favoured cores that clock higher. The next cores in priority are the rest of the cores in the same performance class, first logical core only. Then after that the 2nd logical core for the highest clocking cores (HTT/SMT), and last the 2nd logical core for rest of cores in same performance class (HTT/SMT).
  7. So you could tell Windows on the power schema to only keep 50% of cores unparked, and that soft disables HTT/SMT. It will prevent most software from using the 2nd logical core, but it will be overridden by affinity overrides, software requesting use of all cores, that type of stuff, so it wont block something like an all core cinebench run.
  8. Interestingly if you have all cores unparked, the priority for assigning thread workloads is in a different order, Windows will prefer to use the 2nd logical core on the highest clocking cores before loading the first logical core on the other cores. This is the flaw I think that might be causing loss of performance issues in games and some other software with HTT/SMT enabled. Soft parking is usually enough to fix this, no need for bios disable. Affinity configuration can also of course fix it.
 
Last edited:
So, lazy game developers strike again?

Because if the issue was on AMD side, ALL programs would be affected, correct?

Hell, maybe we could check if the culprit is Windows proven-to-suck-thread scheduler.
 
Last edited:
It's not power or thermals, these are eliminated by the "PBO max" run, also by the PBO max run in the regular review, which is SMT on. I didn't want to add yet another colored bar. Disabling SMT shows surprisingly good gains, bigger than on Zen 4, which is the basis for this article. I'm not claiming I have a fix, and blindly turning SMT off is definitely not the fix. I'm saying "here's some unexpected results, AMD and the community, please look into it"

It might be down to a flaw I discovered in the windows scheduler, I posted above, the relevant line is highlighted in bold. In short if all the logical cores are unparked, Windows prefers to load up the 2nd logical thread on the favoured physical cores before loading up the first logical cores on the other physical cores, for a workload that doesnt need loads of heavy threads it can slow things down, as loading the 2nd logical core even on a faster core will be slower than loading a logical core on a idle physical core.
 
So Intel was on to something for Arrow Lake, ditching HT...
Well, they'll also have vastly more powerful e-cores too this time around. If Intel don't fcuk up Arrow Lake, they do actually deliver on reduced power consumption, then I would take an i5, let alone i7 over a 9700X any day of the week. i5 will have 14 real cores so 14 threads vs 9700X's 8 cores and 16 threads. It will be rare those extra 2 threads will net them a win. I know most people are hyped for X3D models, but for me gaming is 25% max what I do, so unless they have degimped productivity with Zen v-cache models hard to see why one wouldn't choose Intel unless gaming is all you care about.

Still it's Intel, and who knows if Arrow lake will now even ship this year.

Missing Zen 5 performance wasn't found, there's a mere 2.5% between SMT off vs on in games and it's slower in applications.

That's only a slightly larger gain that the 7700X obtained in the charts and not nearly enough to make up for the difference between AMD's marketing claims and reality.
Yeah, I doubt TPU will have convinced Hardware Unboxed to call off the hounds LOL.

I agree, I am disappointed with Hardware Unboxed, they were saying a lot of crap about AMD products,
They didn't even wait for 9900, or 9950 to conclude that Zen5 is shit. They never read the Tomshardware or Phronix review...
Steve actually alluded to other reviews and basically dismissed their results. He's likes to jab the needle into AMD whenever he gets the chance. Maybe Lisa touched him inappropriately when he was a child.

So, maybe AMD should come out with a chip that's like the 9700X with one chiplet where it's an eight-core chiplet that has no SMT support and another chiplet that a whole bunch of Zen 5c cores to make up for the loss of SMT.
Well we can try that on Strix as it's two ccd's one with 4 Zen 5 cores and one with up to 8 Zen 5c cores and you can turn off SMT in the bios I presume. It's a shame we don't have a Strix model with 6 + 6 config to compare to 9600X at same power.
 
As many already stated... is it AMD fault here? Look at the Intel patch fix, and why Hitman takes a toll, shit code, single thread, what multicore and SMT/HT etc?

Michael did benchmarks with EPYC 128 core CPU's where actually those things if they do good or harm are seen more, not this children play and some toy benchmarks. It had mixed results, but at tasks when it does take advantage, basically is coded right, it is useful and for 128 core EPYC 9754 it ate 15W ~ 3% more when enabled.

Okay the recent review with the Ryzen AI 9 HX 370 shows no penalty only gains at all, so reading this causes a headscratcher.

To be fair Linux got great amount of patches with the upcoming Zen5 AMD cpu's and the AMD-PSTATE drivers, that is enabled by default works much better in Linux, it is even a landmark, that the code is almost ready for Zen5 and even RDNA4 even before the products are out. W1z could courage up and include some Linux Proton gaming benchmarks. Now with even nvidia more proper drivers, I cannot see any toothache in doing that.

The point is, that it would clarify better where to point the finger when something really does not work as it should. I point my finger at Redmond here.
 
I do similar but with the free process hacker (system informer).

Here is my notes what I have discovered with CPU scheduling on Windows (10).

  1. Power schema settings control/manipulate how Windows will schedule threads. However this only applies when 'no' affinity is configured, this just doesnt mean from the admin of the machine, but software devs themselves can code in affinity in to their software.
  2. If I add a affinity in process hacker to allow use all cores (same as default in theory), it actually changes the scheduling behaviour from having no affinity configured at all, thats interesting.
  3. Any affinity configured will override the windows scheduling behaviour, so e.g. if you set the power schema settings to park half of the cores and then run a cinebench all core benchmark, it will ignore the schema and use all cores.
  4. Windows core parking is only a soft park, it doesnt actually hardware (C6) park a core, thats handled independently, soft park just will prevent threads been scheduled to the core, but of course if affinity forces use of the core, the software parking is overridden.
  5. A process that launches child processes, the child processes will by default inherit the affinity from the parent, but can be overridden with their own affinity using something like process hacker or process lasso.
  6. Back to the power schema, it isnt possible to park specific cores of your choosing, instead Windows has a kind of priority system. On my 13700k. The 2 highest priority cores are the first logical core of the 2 high clocking cores. (this only applies if C6 state is enabled, without it these cores dont clock higher). This will of course be very likely the same for AMD as that also has favoured cores that clock higher. The next cores in priority are the rest of the cores in the same performance class, first logical core only. Then after that the 2nd logical core for the highest clocking cores (HTT/SMT), and last the 2nd logical core for rest of cores in same performance class (HTT/SMT).
  7. So you could tell Windows on the power schema to only keep 50% of cores unparked, and that soft disables HTT/SMT. It will prevent most software from using the 2nd logical core, but it will be overridden by affinity overrides, software requesting use of all cores, that type of stuff, so it wont block something like an all core cinebench run.
  8. Interestingly if you have all cores unparked, the priority for assigning thread workloads is in a different order, Windows will prefer to use the 2nd logical core on the highest clocking cores before loading the first logical core on the other cores. This is the flaw I think that might be causing loss of performance issues in games and some other software with HTT/SMT enabled. Soft parking is usually enough to fix this, no need for bios disable. Affinity configuration can also of course fix it.
An additional note I forgot to add. Possibly relevant to gaming so will post it.

I mentioned cinebench effectively forces all cores to be used by applying some kind of all core affinity when the test is started, However the CPU tests in 3d mark like the physics test and the other test in one of the other benchies just set a load of threads, but dont apply any kind of affinity so e.g. if you run a 3d mark bench, and have soft parked all the HTT logical cores, then it will affect the bench how it behaves and the final score (it will not utilise those logical cores), this is representative of how most games run, they seem to rely on Windows itself to schedule the threads. (at least the games I play and have tested).
 
It does not explain the results with CPUs like the 7800X3D which runs way below its power target even in multi-threaded work.

I feel like the low power usage of the X3D parts was explained some time ago, they use power differently to protect their stacked cache giving them lower power use.
 
This basically just shows you how stupid the Windows scheduler actually is. It makes no sense to assign a heavy workload to a fully occupied physical core's virtual core. Microsoft should know and detect the difference between a physical core and a virtual one. They should at least make this an option in the power settings or something.

I can't help but wonder if all this anti-SMT stuff is a result of Intel's push to remove SMT from their CPU's, and Microsoft is deliberately nerfing performance to help make a case in the minds of consumers to get rid of it.

But another thing wouldn't surprise me, AMD knows their architecture is cache starved, and that enabling SMT also puts more pressure on the tiny L2 cache. 1MB is a joke.
 
Last edited:
I’ve searched for cpu lasso amd gaming benchmarks and I couldn’t find any. If it’s so good, amd should give a free voucher to buy a lifetime licence. And where is a windows threadt director for amd? They should also encourage retailers to sell linux usb stick combo because of superior performance.
What I can confirm working is setting power plan to ultimate performance, set priority maximum and for old games like ac ii disable core zero then re-enable.
Bottomline is with same settings, IPC isn’t great for average Joe on windows 10.
 
I wonder if Microsoft will use this opportunity to release Windows 12. After all, they pretty much did the same thing with Windows 11 when Intel's Alder Lake architecture couldn't run properly on Windows 10. Naaaaaah....
 
The 7800x3d only uses 88w on all-core workloads via GN. Same as the non-x 7000 chips and the 9000X chips we have so far.

You didn't find the missing performance, you found meaningless 2% which will probably be an issue with the Windows scheduler and Agesa.
Windows 11 scheduler was tweaked for Intel thread director. It's a piece of SHIT...
 
I really don't understand some of you. This is the same as with the zen1 and zen 2 when you disable smt you gain a few %
relative-performance-games-1280-720.png

It's just reaching at this point. Instead of admitting it's a shite product you keep on trying to prove it's not.
 
I wonder if Microsoft will use this opportunity to release Windows 12. After all, they pretty much did the same thing with Windows 11 when Intel's Alder Lake architecture couldn't run properly on Windows 10. Naaaaaah....
Because Intel cooperated with Microsoft both on windows 10 and 11. This is why win10 is great.Now isn’t the case at Intel, because they fired the people who did this, even after got lots of money from us chip acts. And we didn’t saw news of ms-amd cooperations.
 
Back
Top