• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Joined
Nov 13, 2007
Messages
10,826 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
Zen 1 sure wasn't fast at anything but multithreading, that's for sure
Any dual CCD/Mesh (Intel Skylake X etc.) system is still slow to this day -- this includes raptor lake.

1691954167262.png



Core-to-Core Latency - AMD Zen 4 Ryzen 9 7950X and Ryzen 5 7600X Review: Retaking The High-End (anandtech.com)

Combine that with nvidia drives, power savings plans, scaling core frequencies, DDR5 latency and more feature packed motherboards (AM5/1700) plus windows 11 security features and you can get some astronomical DPC latency. Won't notice it in normal use but it's there.

It seems like there's only X3D chips, and disabling E-cores or running VMs that can get scores like the older chips in terms of latency - alot of its is software and motherboard bios, not so much chip latency but i think it all adds up.
 
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
Latency is relative, with massive number of cores there will always be tradeoffs & you simply can't go around physics in that regard ~

 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.92/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Any dual CCD/Mesh (Intel Skylake X etc.) system is still slow to this day -- this includes raptor lake.

View attachment 308860


Core-to-Core Latency - AMD Zen 4 Ryzen 9 7950X and Ryzen 5 7600X Review: Retaking The High-End (anandtech.com)

Combine that with nvidia drives, power savings plans, scaling core frequencies, DDR5 latency and more feature packed motherboards (AM5/1700) plus windows 11 security features and you can get some astronomical DPC latency. Won't notice it in normal use but it's there.

It seems like there's only X3D chips, and disabling E-cores or running VMs that can get scores like the older chips in terms of latency - alot of its is software and motherboard bios, not so much chip latency but i think it all adds up.

Side by side is where things really show up.
Compare the top rows, keeping core counts in mind

You can clearly see that while AMD suffers an inter-CCX penalty, cores within each CCX can communicate with each other a lot easier and faster than anything on the intel side - most inter-core communication is half the latency.
Intel has that little narrow chance of getting things done faster with cores side by side, but outside that the latency is massively higher

1691990939143.png

1691990975377.png




12900K for comparison, too
1691991445548.png



HECK LETS GET EM ALL IN ONE POST


1691991501408.png


Now thats a generational leap in improvements - halving the values in the same CCX
1691991515545.png




1691991580509.png


What stands out is that as soon as E-cores are introduced, is when the latency penalty hits - the 10600K manages lower values all around up until CPU8
1691991589340.png





Can i interpret this stuff?
not yet. Some of it's pretty obvious, like ryzen tending to be faster within a CCX, and slower inter-CCX (Which we already knew single CCX designs were better for latency sensitive stuff like gaming)
 
Last edited:
Joined
Nov 13, 2007
Messages
10,826 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
Side by side is where things really show up.
Compare the top rows, keeping core counts in mind

You can clearly see that while AMD suffers an inter-CCX penalty, cores within each CCX can communicate with each other a lot easier and faster than anything on the intel side - most inter-core communication is half the latency.
Intel has that little narrow chance of getting things done faster with cores side by side, but outside that the latency is massively higher

View attachment 308921
View attachment 308923



12900K for comparison, too
View attachment 308924


HECK LETS GET EM ALL IN ONE POST


View attachment 308925

Now thats a generational leap in improvements - halving the values in the same CCX
View attachment 308926



View attachment 308927

What stands out is that as soon as E-cores are introduced, is when the latency penalty hits - the 10600K manages lower values all around up until CPU8
View attachment 308928




Can i interpret this stuff?
not yet. Some of it's pretty obvious, like ryzen tending to be faster within a CCX, and slower inter-CCX (Which we already knew single CCX designs were better for latency sensitive stuff like gaming)

It really wasn't until the 5800x and 5600x where zen came into the picture from latency side before that it was all intel forever. Strix point is similar to OG zen in the fact that there's a mandatory second CCD full of even slower cores.

Current gen 4 SKU types (7600x 7700x and 7800X3d, and distantly rocket lake with a ring OC and e cores off) that have good latency - and all of those are basically going away unless AMD pushes 8800X3D (with hopefully a 12 core CCD, but even 8 core will be good) -- Intel is only going to make tile chips from now on, and AMD is moving in that same direction for most of the product stack.

10900k and 10600k with ringbus were MUCH lower (their worst core to core latency is 24ns). Plus older motherboards had simpler bioses, and older OSes much lighter and weren't trying to nanny the kernel inside a virtual machine while trying to play hide-and-seek with high entropy memory etc, and nvidia's drivers were'nt gigantic bloated packages.

TLDR - zen5 will have probably 3 SKUs, with x3d being the best for latency but literally every other facet in the system is adding overall latency (and my guess is strix point will have terrible latency bc of + ccd of hybrid cores + software scheduler).
 
Last edited:

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.92/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
It really wasn't until the 5800x and 5600x where zen came into the picture from latency side before that it was all intel forever. Strix point is similar to OG zen in the fact that there's a mandatory second CCD full of even slower cores.

Current gen 4 SKU types (7600x 7700x and 7800X3d, and distantly rocket lake with a ring OC and e cores off) that have good latency - and all of those are basically going away unless AMD pushes 8800X3D (with hopefully a 12 core CCD, but even 8 core will be good) -- Intel is only going to make tile chips from now on, and AMD is moving in that same direction for most of the product stack.

10900k and 10600k with ringbus were MUCH lower (their worst core to core latency is 24ns). Plus older motherboards had simpler bioses, and older OSes much lighter and weren't trying to nanny the kernel inside a virtual machine while trying to play hide-and-seek with high entropy memory etc, and nvidia's drivers were'nt gigantic bloated packages.

TLDR - zen5 will have probably 3 SKUs, with x3d being the best for latency but literally every other facet in the system is adding overall latency (and my guess is strix point will have terrible latency bc of + ccd of hybrid cores + software scheduler).
The 3950x is quite consistent within a CCX. Compare it to the 13900K and you'll see the massive regression in the newer intels

3950x has 8 cores (+SMT) then another CCX of the same
Latency matches that, with 8 cores having consistent latency between each other.
1692086662940.png


Zero argument the 13900k is better overall - but the latency creeps up once you pass two threads, and just keeps climbing.
1692086731412.png

Something hits them hard when multiple cores are active, it's a very large regression and if people don't make a fuss about it, it wont be something they fix.

You compare either to the 10600K, and things look really weird
1692087108761.png


The 10900K made it weirder again, because just two of the threads have crazy latency and the rest are lower - performance would be erratic as heck if things got scheduled onto those by the OS.
1692087168533.png




Different architectures are not apple to apples comparisons for sure, but the 10600k and 10900k are the same architecture and its from then on that things got weird and started going backwards, and some tasks definitely show it.

It's like they have low latency only for 1-2 threads, then the latency gets trashed in exchange for bandwidth - it's a hard shift and not a smooth curve or consistent result.
 
Joined
Nov 13, 2007
Messages
10,826 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
The 3950x is quite consistent within a CCX. Compare it to the 13900K and you'll see the massive regression in the newer intels

3950x has 8 cores (+SMT) then another CCX of the same
Latency matches that, with 8 cores having consistent latency between each other.
View attachment 309066

Zero argument the 13900k is better overall - but the latency creeps up once you pass two threads, and just keeps climbing.
View attachment 309068
Something hits them hard when multiple cores are active, it's a very large regression and if people don't make a fuss about it, it wont be something they fix.

You compare either to the 10600K, and things look really weird
View attachment 309069

The 10900K made it weirder again, because just two of the threads have crazy latency and the rest are lower - performance would be erratic as heck if things got scheduled onto those by the OS.
View attachment 309070



Different architectures are not apple to apples comparisons for sure, but the 10600k and 10900k are the same architecture and its from then on that things got weird and started going backwards, and some tasks definitely show it.

It's like they have low latency only for 1-2 threads, then the latency gets trashed in exchange for bandwidth - it's a hard shift and not a smooth curve or consistent result.
The P cores are stacked in two rows around the cache so might be the way the cores are activating in this test and where they are relative to the SA/Uncore.

1692098477201.png


Either way the extra e core clusters and the ring of rings adds latency, but not nearly as much as a CCD cluster via infinity fabric.
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.92/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Either way the extra e core clusters and the ring of rings adds latency, but not nearly as much as a CCD cluster via infinity fabric.
That's the difference in how you interpret it

Intel: All the P-cores have higher latency
AMD: Cores between CCX have higher latency


An AMD setup with 8 cores per CCX still has 8 cores that can talk faster to each other, than the intel setups


trying to crop these down so nothing relevant is missing, but still enough to explain what i'm meaning

1692763832667.png

1692763931919.png

These values are main cores and SMT, so 0-1 is core1 and it's SMT, then going on in pairs until you hit another CCX or E-core.

P-cores only vs two 8 core CCX's (Story doesn't change with three orfour CCXs)

Intel has a latency gain above 2 threads, not 2 cores but two threads. - and it's far worse than what AMD has there.

On AMD, 2 threads (0+1) is 6.6ns vs 4.0ns on intel - win for intel for sure.
But a 3 thread task using 2 physical cores (core 0 to core 2) breaks massively the other way with AMD being at 16.2ns and intel at 26.6ns - and intel stays being worse with latency until another CCX is involved.
When it comes to low latency tasks like gaming, you dont buy a CPU with E-cores or extra CCX's - we've known that for a while and this is the reason why.

TL;DR: Any task under 16 threads/ 8 cores has lower latency on AMD now. Intel's advantage is now only for 1-2 cores.



Following the values we've found in these fancy charts, they match up with gaming performance fairly well - intel would have a large boost when games stick with the typical 1-2 threads that was common for a long time, then fall down if they need to sync with other cores

AMD Ryzen 9 7950X3D Review - Best of Both Worlds - Minimum FPS / RTX 4090 | TechPowerUp
1692764397903.png



Heavily single-threaded engine:
1692764464514.png


cache sensitive engine
1692764492235.png


heavily multi threaded engine known for 100% CPU usage on 4c/8t CPUs
1692764518389.png



Am i saying these latency values are the most important value at all? nope!
Just that they are important and we need to know and care if things slip backwards.


as an example of why it matters, is that you need high clock speeds and brute force power to compensate for things like this so we end up with insane differences like this
1692764610667.png


1692764639715.png


5800x3D to 13900K:
275% more power for 1.7% higher FPS minimums?
Why? Because it needs to clock insanely high to brute force things, it's like the pentium 4 all over again.


Oh i suppose this chart explains it: It's dominated by single CCX hardware (And P-core only intels), with the exception of the 7000 series 3D parts where you can lock a game to use them exclusively and it becomes single CCX as far as that games concerned
1692764915369.png




I dont want EITHER company to go down the road of throwing us the scraps from enterprise customers designs to help recover R&D costs, in ways that are actually regressions.
Gamers don't need more performance in R23, they just need 6+ cores in the same CPU without latency issues.
 
Joined
Sep 8, 2009
Messages
1,077 (0.19/day)
Location
Porto
Processor Ryzen 9 5900X
Motherboard Gigabyte X570 Aorus Pro
Cooling AiO 240mm
Memory 2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s) Radeon RX 6900XT Reference (amd.com)
Storage O.S.: 256GB SATA | 2x 1TB SanDisk SSD SATA Data | Games: 1TB Samsung 970 Evo
Display(s) LG 34" UWQHD
Audio Device(s) X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply XFX 850W
Mouse Logitech G502 Wireless
VR HMD Lenovo Explorer
Software Windows 10 64bit
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
Which chips is this? I don;t follow codenames all the time, what's model numbers here or they're not released yet?
 
Joined
Sep 8, 2009
Messages
1,077 (0.19/day)
Location
Porto
Processor Ryzen 9 5900X
Motherboard Gigabyte X570 Aorus Pro
Cooling AiO 240mm
Memory 2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s) Radeon RX 6900XT Reference (amd.com)
Storage O.S.: 256GB SATA | 2x 1TB SanDisk SSD SATA Data | Games: 1TB Samsung 970 Evo
Display(s) LG 34" UWQHD
Audio Device(s) X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply XFX 850W
Mouse Logitech G502 Wireless
VR HMD Lenovo Explorer
Software Windows 10 64bit
Now with performance numbers:




Which chips is this? I don;t follow codenames all the time, what's model numbers here or they're not released yet?

This chip is a smaller and lower power version of the Phoenix APU, with 2x Zen4 and 4x Zen4c cores, and only 2x WGP RDNA3.
Phoenix (7840U, Z1 Extreme) was aimed at 20-45W and it runs pretty poorly below 15W. Phoenix2 (7440U, Z1) probably does very well below 15W for its tiny GPU and lower-clocked / denser efficiency cores.
Gaming performance on the Z1 might be very similar to the Steam Deck's Van Gogh, perhaps at a lower power (especially if the Zen4 vanilla cores can be disabled somehow).

For those looking at the ROG Ally and Legion Go for emulation, the Z1 versions should be ideal, as it still gets AVX512.
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.92/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Joined
Sep 8, 2009
Messages
1,077 (0.19/day)
Location
Porto
Processor Ryzen 9 5900X
Motherboard Gigabyte X570 Aorus Pro
Cooling AiO 240mm
Memory 2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s) Radeon RX 6900XT Reference (amd.com)
Storage O.S.: 256GB SATA | 2x 1TB SanDisk SSD SATA Data | Games: 1TB Samsung 970 Evo
Display(s) LG 34" UWQHD
Audio Device(s) X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply XFX 850W
Mouse Logitech G502 Wireless
VR HMD Lenovo Explorer
Software Windows 10 64bit
This article was a month newer than those posts?

The news about Phoenix2 using a hybrid core arrangement had been around since March, and I pointed it out at the time.

Regardless, that's water under the bridge and I'm happy this has been corrected in the news post of today.
 
Top