• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

We found the Missing Performance: Zen 5 Tested with SMT Disabled

Joined
Apr 30, 2020
Messages
963 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
I just want to know why anyone didn't do a locked clock of 4Ghz Zen 4 Vs 4Ghz Zen 5.
That's the easiest way to see actual IPC increase in design.
PBO/& other stuff just get in the way of actual analyst
 
Joined
Apr 30, 2020
Messages
963 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
Do you mean like this?
Right, some what, what was i expected its correct.

AMD already stated that the zen 5 would not beat the zen 4 with 3D V-cache in gaming. Why is everyone putting the 7800x 3D in their comparisions & reviews???

Zen 4 with 3D v-cache is like 20% faster than zen 4 without 3D v-cache. Its even faster than what zen 3 got with added cache.

AMD claimed ipc for zen 5 was 16% that daigram/slide looks like it was against the 5800x 3D with a 7900 xtx (or two 7900 xtx which would be really odd since i know most of the games they listed do not use mutli-gpu/mGPU ???)

You can look at the review of zen 4 on here at techpowerup when it released in september 2022 for zen 4.

The zen 4 ryzen 7 7700x did not beat the 5800x 3D all the time in gaming. It won some & lost some, it was game dependent. Its within 95% to 99% of the 7700x, & thats in w1zzards review on relative prefromance through the resolutions. Overclocking is where it started losing because of clock speed limits put on the 5800x 3D.

On average zen 4 was 8 to 10% faster pre-clock than zen 3 without 3D cache. The rest was all from higher clockspeeds going beyond 5.0ghz. Single core went from a limit of 4.6 to 5.0ghz to 5.5ghz to 5.7ghz thats a lot of clock speed.

A bit of math (8% to 10 % preclock from zen 3) + (20% from 3D v-cache in zen 4 ) would show you zen 5 would need to be above 19% to 22% faster per-clock to even get close to the zen 4 3D cache cpu in gaming.

At this point its like trying to review a game with d.l.s.s on. for a nvidis gpu to amd gpu in the same game that never got fsr.

Imo. A 3d cache cpus should only be compared with 3d cache cpus or their own base designed cores.

In all honesty imo I expect 3D cache to give a minuim of 25% increase in gaming for zen 5. My reasoning is because amd had already gained a decent 5% gain going from 15% with zen 3 3D v-cache & another 5% on top that for zen's 4 for its 20% with 3D V- cache on zen 4 in gaming
 
Joined
Dec 25, 2020
Messages
6,403 (4.58/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
In all honesty imo I expect 3D cache to give a minuim of 25% increase in gaming for zen 5. My reasoning is because amd had already gained a decent 5% gain going from 15% with zen 3 3D v-cache & another 5% on top that for zen's 4 for its 20% with 3D V- cache on zen 4 in gaming

25% is far too optimistic unless AMD has fully resolved the cycle penalty drawback of X3D with this third generation. Standard cache chips have had a lower L3 access latency, and tend to outperform X3D at iso clocks if the extra capacity is not necessary. Otherwise the same gains and losses seen in Zen 4 are to be expected.

It's painfully obvious this architecture was not designed with client segment systems in mind. Zen 5 seems to fully lean on and focus on the demands of EPYC buyers. Its AVX capabilities are remarkable, but otherwise it's a very underwhelming iteration upon its predecessor.

If Intel's claims regarding Lion Cove and Skymont materialize, Arrow Lake will smoke these chips.
 
Joined
Nov 18, 2009
Messages
8 (0.00/day)
System Name Gaming rig
Processor i7 6950K
Motherboard Asus X99-Deluxe
Cooling Thermaltake Water 3.0 Riing RGB 240
Memory 32GB DDR4-3000
Video Card(s) Titan X (Pascal)
Storage 500GB 950 Pro, 500GB 850 Evo, 2x5GB HDD RAID1
Display(s) Dell U3011
Case Jonsbo UMX4 Windowed (Silver)
Audio Device(s) Creative Soundblaster Z
Power Supply Thermaltake 1050W RGB
Software Windows 10
Benchmark Scores 23407 - Firestrike (better than 99% of all results!) https://www.3dmark.com/fs/10511898
I get that SMT off gives rise to 2x BP and decoders per thread while SMT on gives each thread access to one BP and decoder. But even with just one each with SMT on, it's the same as previous gen but the uplift on average seems to be around 5% whereas single threaded IPC in terms of floating point increased by a good 18% even without AVX512 workloads and games should definitely benefit from that.
I think the answer there is also in the video, yes they can use one BP and decoder each thread, but they're not exclusive. I think Mike mentions that they reserve some resources for each thread so that if one is idle then wakes up it can get to work without waiting, but otherwise it seems like they can be dynamically used by either thread, with a higher performing thread able to use more of the front end than the other. It's not a hard partition of resources.
 

Mr_Engineer

New Member
Joined
Aug 8, 2024
Messages
15 (0.20/day)
Think of what it would do to SMT and 1% lows. A lot more than +-5% with Zen5.

AMD designed themselves into a corner when they released AM5 due to keeping compatibility for coolers. The chip is too physically small, the same for their EPIC line up, so the physical CCD cannot get over a certain size. AMD could have given each core a 2MB L2 cache, which it obviously needs, but because they stuck with 4nm they would have hit their size limit, so it should have been a 3nm design. So AMD took the easy route, 4nm. Then AMD made the tiny 1MB as fast as possible to help mitigate the problem, which it didn't, thus the bad SMT performance, and below the promised 16% IPC gain in most non-math intensive applications, then obviously the L3 cache is half the size it should be, due to AMD's slow IF and memory controller.

The wish list for Zen6 is long, and I don't think they will do much beyond giving it a 2MB L2 cache, (1.5MB would not surprise me in the least... drip...drip...) and fix whatever low hanging fruit they deliberately didn't fix. The biggest problem they have is the IO die, and that is holding AM5 back with awful memory support, as well as its physical size, but I very much doubt we will see a new IO die in Zen 6, unless they are planning a Zen 7 on AM5.
I agree and disagree.

While increasing the L2 cache size would have improved performance, the gain wouldn't have been that huge and would have caused other problems for the Zen 5 engineers (at 4nm). The doubling of the cache bandwidth was a better way to go at 4nm to increase performance. Usually, the different levels of cache should increase together, especially the L2 and L3.

Thats why for Zen 6, if we assume each core would have 1.5MB L2 (a 50% increase), the L3 would be 48MB (50% increase) if a CCD is 8 cores. If for Zen 6 each CCD is 16 cores, then L3 cache would be 96MB. This should be doable on a highly optimized 3nm process. Zen 6 would most likely get a new IOD on 4nm, with faster memory support, RDNA3/4 (with 4CUs?), faster infinity fabric, and faster IMC maybe supporting DDR5-8000 1:1 ;)
 
Last edited:
Joined
Apr 19, 2018
Messages
1,227 (0.52/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
AMD already stated that the zen 5 would not beat the zen 4 with 3D V-cache in gaming. Why is everyone putting the 7800x 3D in their comparisions & reviews???
Hardly anybody is talking about that. We are all talking about 7700x vs 9700x. The 9700x is only showing to be 3-5% faster, and in some cases it's actually slower than the older model, and is much more expensive, and does not even come with a cooler, unlike the model it replaced.

But for the few that are comparing it to the 7800x3D, the 9700x is supposed to offer a 16%+ performance improvement, so why not compare it? It should only be 5% slower, right? Actually, the 9700x has a 100MHz higher boost clock and runs cooler, so it should be, on average, clocking even higher, so it should only be about 4-2% slower than the 7800x3D, should it not?

But let's face it, AMD have starved the cores of cache in order to make the x3D cache versions look even better, and probably charge even more money for them this time round. AMD have dug themselves into a greedy little hole with their 3D cache, and it's starting to bite.

We really need more technical reviews which go after memory bandwidth and latency to see what effect that has on performance. We also need to see tests done between this and the previous gen CPU done at the same clocks.

But I predict the 3D cache version of this will see an even higher performance uplift when compared to the difference between the 7700x and 7800x3D, not that it will address the tiny L2 cache, which IMO is part of the problem, especially with the SMT performance. AMD should have waited 6 months and gone for a 3nm design with a 2MB L2 cache, and an improved memory controller, as this stinks of greed and rushing the release.
 
Last edited:
Joined
May 22, 2010
Messages
386 (0.07/day)
Processor R7-7700X
Motherboard Gigabyte X670 Aorus Elite AX
Cooling Scythe Fuma 2 rev B
Memory no name DDR5-5200
Video Card(s) Some 3080 10GB
Storage dual Intel DC P4610 1.6TB
Display(s) Gigabyte G34MQ + Dell 2708WFP
Case Lian-Li Lancool III black no rgb
Power Supply CM UCP 750W
Software Win 10 Pro x64
the choice of staying with the same cIOD as the "early" 7xxx series DDR5 memory controller is biting them in the ass, Intel sadly has shown to have a vastly superior DDR5 controller that can handle 8000+ MT and stably.

Specially for a uArch that's starved of mem BW like ZEN

It's also a joke that AMD won't use the dual infinity fabric with single CCD like it does with specialty EPYC high bandwidth models that have essentially 2 IF links to a single CCD for double BW.

I had high hopes for ZEN5, now i'm more than happy to stay in my "golden sample" 7700X forever, maybe it will be great for EPYC, not for client...
 

Mr_Engineer

New Member
Joined
Aug 8, 2024
Messages
15 (0.20/day)
AMD has themselves to blame for the negativity desktop Zen 5 is receiving.

If Zen 5 was released in 2023, then it would have been great. But two years after Zen 4 everyone expected more (3nm, larger caches, faster infinity fabric & memory controller, RDNA3).
 
Joined
Oct 30, 2020
Messages
225 (0.15/day)
I think the answer there is also in the video, yes they can use one BP and decoder each thread, but they're not exclusive. I think Mike mentions that they reserve some resources for each thread so that if one is idle then wakes up it can get to work without waiting, but otherwise it seems like they can be dynamically used by either thread, with a higher performing thread able to use more of the front end than the other. It's not a hard partition of resources.

Thanks - I did hear it, my takeaway was the primary thread, if scheduled correctly, should have access to most of the front end anyway since it should (ideally) be the thread utilizing the most resources. I get that games and windows scheduler is wonky half of the time. But from what I can deduce, in previous generations SMT didn't have as much of an impact in games because the primary thread had access to most of the front end resources and there was no 'giving each thread a decoder and predictor'. In Zen 5, to make SMT more efficient and reduce front end stalls they've 'doubled' the front end and given both the primary and secondary thread access to half of the decoders and front end and then dynamically adjusts the resource allocation. This is great for applications but isn't playing nice with games where each thread isn't as resource hungry or parallelly threaded anyway and are better off with forcing a thread to a core and getting access to the whole front end.

If that is the case, even X3D will show improvements in game performance albeit less than their vanilla counterparts with SMT off.

In Ryzen Master, enabling 'Game Mode' disables SMT so I think AMD has been aware of this even in previous generations even though the average uplift in games was very little by disabling SMT. At this point, they might as well figure out a way to automatically disable SMT when playing games through the game bar or something.

AMD has themselves to blame for the negativity desktop Zen 5 is receiving.

If Zen 5 was released in 2023, then it would have been great. But two years after Zen 4 everyone expected more (3nm, larger caches, faster infinity fabric & memory controller, RDNA3).

I agree, they basically stuck to a very similar node and focused on accelerating server performance within the same die area. I also think they didn't move to 3nm yet because for high frequency designs, N4P is still superior. They did move Zen5c to 3nm.

They also had to design an arch that can scale up in future iterations. There's a good chunk of low hanging fruit here and Mike Clark mentioned some of them - There's a lot of gains on the back end and there's also reduced utilization in the front end now because there's just more available resources. (omission of no-op fusion etc).
 
Last edited:
Joined
Sep 20, 2021
Messages
408 (0.36/day)
Processor Ryzen 7 9700x
Motherboard Asrock B650E PG Riptide WiFi
Cooling Underfloor CPU cooling
Memory 2x32GB 6200MT/s
Video Card(s) RX 7900 XT OC Edition
Storage Kingston Fury Renegade 1TB, Seagate Exos 12TB
Display(s) MSI Optix MAG301RF 2560x1080@200Hz
Case Phanteks Enthoo Pro
Power Supply NZXT C850 850W Gold
Mouse Bloody W95 Max Naraka
We really need more technical reviews which go after memory bandwidth and latency to see what effect that has on performance. We also need to see tests done between this and the previous gen CPU done at the same clocks.
At the moment we have enough data to understand what is happening.

A simple example: AMD said that when the data is in the x3D cache, it's faster than if you get the data from RAM, but didn't explain what happens when the data is NOT in the cache and the processor has to wait for a cache lookup + wait to get the data from RAM - which is slower than on non-x3D processors (and it's a constant time that can't be shortened in any way).
 
Joined
Jan 29, 2012
Messages
6,854 (1.47/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x-5600x | 9600k
Motherboard B450 AORUS M | Z390 UD
Cooling EK AIO 360 - 6 fan action | AIO
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB) | Samsung DDR4 (4x8GB)
Video Card(s) EVGA 3070ti FTW
Storage Various
Display(s) Pixio PX279 Prime
Case Thermaltake Level 20 VT | Black bench
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W | EVGA 700 Gold
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
Been using ryzens with smt off since I got them.Everyone always says why... Now years later you know why.
 
Joined
May 3, 2018
Messages
2,881 (1.22/day)
Right, some what, what was i expected its correct.

AMD already stated that the zen 5 would not beat the zen 4 with 3D V-cache in gaming. Why is everyone putting the 7800x 3D in their comparisions & reviews???

Zen 4 with 3D v-cache is like 20% faster than zen 4 without 3D v-cache. Its even faster than what zen 3 got with added cache.

AMD claimed ipc for zen 5 was 16% that daigram/slide looks like it was against the 5800x 3D with a 7900 xtx (or two 7900 xtx which would be really odd since i know most of the games they listed do not use mutli-gpu/mGPU ???)

You can look at the review of zen 4 on here at techpowerup when it released in september 2022 for zen 4.

The zen 4 ryzen 7 7700x did not beat the 5800x 3D all the time in gaming. It won some & lost some, it was game dependent. Its within 95% to 99% of the 7700x, & thats in w1zzards review on relative prefromance through the resolutions. Overclocking is where it started losing because of clock speed limits put on the 5800x 3D.

On average zen 4 was 8 to 10% faster pre-clock than zen 3 without 3D cache. The rest was all from higher clockspeeds going beyond 5.0ghz. Single core went from a limit of 4.6 to 5.0ghz to 5.5ghz to 5.7ghz thats a lot of clock speed.

A bit of math (8% to 10 % preclock from zen 3) + (20% from 3D v-cache in zen 4 ) would show you zen 5 would need to be above 19% to 22% faster per-clock to even get close to the zen 4 3D cache cpu in gaming.

At this point its like trying to review a game with d.l.s.s on. for a nvidis gpu to amd gpu in the same game that never got fsr.

Imo. A 3d cache cpus should only be compared with 3d cache cpus or their own base designed cores.

In all honesty imo I expect 3D cache to give a minuim of 25% increase in gaming for zen 5. My reasoning is because amd had already gained a decent 5% gain going from 15% with zen 3 3D v-cache & another 5% on top that for zen's 4 for its 20% with 3D V- cache on zen 4 in gaming
So you missed this slide from AMD. Also direct quote from AMD: "The new "Zen 5" chips, such as the Ryzen 7 9700X and Ryzen 9 9950X, will come close to the gaming performance of the 7800X3D and 7950X3D".

So total BS like we had about RDNA3 7900 series performance and efficiency.

1723600058121.png
 
Joined
Jan 3, 2021
Messages
3,383 (2.44/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
X3D chips always follow later, probably because the validation / production process being a bit more complex and not really utilised outside of the desktop PC space mean that it will always lag physical development and manufacturing vs just the core CCD dies - why hold back one product and build up masses of inventory just to release a halo product which actually will not even make up 90+% of your sales? Who do you think they are... Apple/Intel?
Epyc and TR processors with 3D VCache may be more specialised, less general-purpose chips but they a fantastic choice for certain types of work - benchmarks at Phoronix proved that. They are also widely available in retail, which makes me believe they are fairly popular.
I also remember that 3D Epycs came to market much later than the 5800X3D. Obviously the technology was too new, not field-tested and not trusted, so the Ryzen also served as a test vehicle for the serious stuff.
As a non-developer I can't answer that question. I don't think before Windows 11 there were great mechanisms for the CPU to push the OS scheduler into making informed decisions about which CPU core to use for certain tasks - to what extent that can do is unknown (i.e. is it just for 'performance/economy' or can it be informed about utilising certain cores / resources for lower latency, etc.). I don't follow Linux kernel updates so no idea what the capabilities are there, but there would need to be some interface / metric provided by the CPU to inform the OS scheduler about how to efficiently run something and I'm not sure there is such a thing in place.
You've exactly described the Intel Thread Director. It's the hardware component that gathers statistics related to code execution that would otherwise be unavailable to the OS. Clearly, something like that is necessary due to P+E+HT complexity (P behaves like a completely different type of core when burdened by HT, so we can talk about 3 types of cores here). I don't understand how AMD gets around that with their hybrid CPUs, which are even more complex (because lil' cores have HT too).
Would be great if someone could actually provide some insight in to that. It seems HT/SMT and the OS schedulers are still basically 'hoping for the best' in terms of managing processes and threads generated.
Yeah, I agree. The scheduler must make decisions with very limited data and in a very short time in order to not put additional load on the CPU.

Thanks - I did hear it, my takeaway was the primary thread, if scheduled correctly, should have access to most of the front end anyway since it should (ideally) be the thread utilizing the most resources. I get that games and windows scheduler is wonky half of the time. But from what I can deduce, in previous generations SMT didn't have as much of an impact in games because the primary thread had access to most of the front end resources and there was no 'giving each thread a decoder and predictor'. In Zen 5, to make SMT more efficient and reduce front end stalls they've 'doubled' the front end and given both the primary and secondary thread access to half of the decoders and front end and then dynamically adjusts the resource allocation. This is great for applications but isn't playing nice with games where each thread isn't as resource hungry or parallelly threaded anyway and are better off with forcing a thread to a core and getting access to the whole front end.
None of the two hardware threads running on the same core has a higher priority than the other. The x86/x64 architecture simply has no provision for that. If Zen 5 does, that's a first (and that feature would require a few new instructions).

To best handle the main time-critical thread of the game, the OS should have the ability to "clear the way" for it and let it run alone on a core.
 
Joined
Apr 19, 2018
Messages
1,227 (0.52/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
You've exactly described the Intel Thread Director. It's the hardware component that gathers statistics related to code execution that would otherwise be unavailable to the OS. Clearly, something like that is necessary due to P+E+HT complexity (P behaves like a completely different type of core when burdened by HT, so we can talk about 3 types of cores here). I don't understand how AMD gets around that with their hybrid CPUs, which are even more complex (because lil' cores have HT too).
I think Intel had to create the hardware director because they use cores with different capabilities, and due to that complexity Microsoft are not up to the task of making a scheduler capable of such a feat.

But from all accounts, this hardware director is not very good or smart due to continuing issues with games, and most people finding better, more consistent performance when totally disabling the E cores.

And the fact that AMD has identically featured cores is how they get around not needing a thread director in hardware. They use a software driver and Microsoft's game bar to identify games, and configure the CPU accordingly... When it works.
 
Joined
Apr 30, 2020
Messages
963 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
So you missed this slide from AMD. Also direct quote from AMD: "The new "Zen 5" chips, such as the Ryzen 7 9700X and Ryzen 9 9950X, will come close to the gaming performance of the 7800X3D and 7950X3D".

So total BS like we had about RDNA3 7900 series performance and efficiency.

View attachment 358860
Why care so much about that?
No one person even match those programs or games exactly to do these tests so why does it matter.

Secondly that's not the Slide I was looking I said 5800x 3D VS 9700X at either because I can't find the GNR-07 slide to confirm any settings or the setup itself.

completely useless.
 

AsRock

TPU addict
Joined
Jun 23, 2007
Messages
19,035 (3.01/day)
Location
UK\USA
Turning SMT off playing Soulmask makes it crash for me, with the dreaded 124 error.
 
Joined
Jan 31, 2022
Messages
60 (0.06/day)
Reminds me a lot of the early days of SMT, with Pentium 4 and 1st gen Core i, where disabling it often helped with gaming performance.
Then the schedulers got better and started loading the first thread on each core before loading the second thread.

The chiplet design in Ryzen (and to a certain extend also the old Pentium D and Core 2 Quad chips) benefits from fully loading one die before going to the second. Like how the 5800X kept beating the 5900X because it had a full 8-core die, while the 5900X had it's cores spread across two dies.

And it's not even a big deal to turn it on and off. It takes barely longer than a reboot. I could totally see turning SMT off for gaming, and turning it back on for rendering.
 

rodrigorras

New Member
Joined
Aug 10, 2024
Messages
13 (0.18/day)
I don't think 9700x is a flop...
 
Joined
Jan 3, 2021
Messages
3,383 (2.44/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
And it's not even a big deal to turn it on and off. It takes barely longer than a reboot. I could totally see turning SMT off for gaming, and turning it back on for rendering.
There really, really, really should be a better solution. I'm confident AMD will bring a good software fix. If MS is cooperative, it will be very good; if MS and game developers are cooperative, it will be perfect.
 
Joined
Apr 19, 2018
Messages
1,227 (0.52/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
People's reactions to this SMT off thing is so funny. Do some research, it's embarrassing.
 
Joined
May 10, 2023
Messages
198 (0.37/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Skimmed through the article and got to the conclusion where the author seems at a loss as to why SMT behavior is like this with no word from AMD about SMT changes to explain why.
Haven't read this whole forum thread, maybe someone has already pointed this out, but AMD in its press releases did hint at SMT improvements, if you looked hard enough and thought about it.
The key is the dual branch predictors and decoders, new to Zen5.
While not much admittedly is said of it in the official releases, it is mentioned and shown in diagrams.
A video VERY much worth watching is from Chips and Cheese, he goes into the depths of the new architecture changes with an AMD engineer about Zen5.
Specifically, he asks at one point, if 1T loads can make full use of all the core front-end resources (predictors, decoders etc) and the answer is YES.
So, disable SMT, you're forcing 1T mode per core, thus each thread gains 2 branch predictors and decoders per thread instead of 1.
I would say that the benchmarks with the biggest performance gains with SMT disabled are scenarios where the extra branch prediction and/or decoder muscle is kicking in to save the CPU from stalls of failed predictions or is simply keeping the core more fully fed.
In SMT mode, in those scenarios, they're actually a little predictor or decoder-starved!
Interesting results, keep up the good work TPU!

Moment in the video here:
Chips and cheese also did an amazing article on Zen 5!

One important thing to notice in there about those SMT changes is that software should be aware to not keep changing between 1T and 2T that frequently:
It is expensive to transition between single-threaded (1T) mode and dual-threaded (2T) mode and vice versa, so software should restrict the number of transitions

Software Optimization Guide for AMD Family 17h Processors (Zen 1)
However, they found out that enabling or disabling SMT didn't make much of a difference when it comes to instruction bandwidth:
Frontend bandwidth is identical regardless of whether the core is running with SMT off, or with SMT on and one thread active. Once code spills out of the micro-op cache, fetch bandwidth drops to four 4-byte NOPs per cycle. As with mobile Zen 5, using both SMT threads together brings total fetch bandwidth to 8x 4-byte NOPs per cycle.
1724080075195.png


Seems like the µarch was really designed to have SMT enabled and make use of it to maximize instruction throughput. Using both threads per core is what gives you the highest overall IPC (even though the per-thread IPC is reduced, which should be obvious):
1724080233369.png


2x IPC uplift in 7-zip, that's like adding a new core by itself, that's nothing to scoff at.


With all that said, I'm wondering why TPU got such uplifts with SMT off in games. Maybe it's just a scheduler issue on Windows that's making the CPUs transition between 1T and 2T mode when it shouldn't?
 
Joined
Apr 19, 2018
Messages
1,227 (0.52/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
Cherry picked games and apps will confirm anything for the clicks.
 
Joined
Aug 10, 2023
Messages
341 (0.78/day)
So enable SMT for applications and disable for games. Got it!
Did we read the same article? A few percent aren’t worth talking about, hence SMT ON is the way to go, in other words, don’t touch it. Which was great was the OC, sadly it wasn’t applied with SMT On, just with Off. I guess I can look it up in the review though.
 
Joined
Apr 19, 2018
Messages
1,227 (0.52/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
Did we read the same article? A few percent aren’t worth talking about, hence SMT ON is the way to go, in other words, don’t touch it. Which was great was the OC, sadly it wasn’t applied with SMT On, just with Off. I guess I can look it up in the review though.
Only a small percentage of games benefit from SMT off. This is why I'm loving all the hysteria over this. I tried my own benchmarks after disabling SMT a year ago, it's not good, most things lost between 15% and 30% performance. But I love it when people say "I turned it off and it's amazeballezz! - best thing I ever did, and my temps are lower!!!" Yeah, temps are lower because your chip is doing less!
 
Top