Thursday, August 10th 2023

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Beating previous reports that AMD is increasing the CPU core count of its mobile monolithic processors from the present 8-core/16-thread to 12-core/24-thread; we are learning that the next-gen processor from the company, codenamed "Strix Point," will in fact be the company's first hybrid processor. The chip is expected to feature two kinds of CPU cores, with "Zen 5" being the microarchitecture behind the performance cores, and "Zen 5c" behind the efficiency cores. An engineering sample featuring 4 P-cores, and 8 E-cores, surfaced on the web, thanks to Performancedatabases. A HWiNFO screenshot reveals the engineering sample's core-configuration of 4x P-cores and 8x E-cores, with identical L1 cache sizes. Things get a little fuzzy with the L2 cache size detection, and L3 cache.

We know from the current "Zen 4c" core design that it is essentially a compacted version of "Zen 4" designed for higher-density chiplets that have 16 cores; and that it has both the same ISA and IPC as "Zen 4," with the only difference being that "Zen 4c" is designed with lower amounts of shared L3 caches at their disposal, are generally configured with lower clock speeds, and have higher energy efficiency than "Zen 4." "Zen 4c" cores also 35% smaller in die-area than "Zen 4." The company could develop "Zen 5c" CPU cores with similar design goals.
The "Strix Point" silicon could hence have two CCX (CPU core complexes); one of which has the larger "Zen 5" P-cores and certain amount of L3 cache, and another CCX with the smaller "Zen 5c" cores, and their own L3 caches. This would essentially be similar to "Renoir," which has two 4-core CCXs of "Zen 2" cores. The L1 cache sizes for both kinds of cores is identical—48 KB L1D and 32 KB L1I, and it's likely that both core types have 1 MB of dedicated L2 caches per core. The L3 cache sizes could vary between the two CCXs, with the P-core CCX having 16 MB (4 MB per core), and the E-core CCX 8 MB (512 KB per core).

It would be interesting to imagine how AMD handles the hybrid architecture from a software standpoint. Intel uses Thread Director, a hardware-based solution that's designed to send the right kind of compute workload to the right kind of CPU core. AMD could either try to develop its own version of Thread Director, or use a less sophisticated OS-based solution such as what it's doing with its multi-CCD client processors.
Sources: Performancedatabases, IThome, VideoCardz
Add your own comment

86 Comments on AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

#51
HD64G
mahirzukic2Sure, that would be:
  • firstly, cheaper to produce since they have smaller surface area, but since it will have more cores, the die area will remain the same, so not really cheaper to begin with
  • secondly, more expensive since you need to stack cache
  • so such design will be more expensive than one with regular Zen 5 cores with no cache stacking
Which means that it could turn out to be faster than pure Zen 5 cores (due to more cores) in some workloads, while in others obviously not (due to lower clocks due to stacked cache), all the while being more expensive.
Well if your use case is such that such configuration benefits it, you could still go for this kind of design even being more expensive, it could turn out to be cheaper per core.
But that's a BIG IF. And exactly a reason to have hybrids in the first place.
DenverOptimization for high density has lower clocks as a weakness. Then you would be losing performance.

Besides being more expensive, another point is that the cache is stacked over the L3, effectively doubling it. But APUs only have half of the L3, so using 3D cache would only reach the same amount as desktop processors. add it all up and you will see that such a product would make no sense.
Cost could be higher, but witn 2 Zen5c chiplets and the 3D cache stacked on top, it would be the perfect workstation mobile CPU imho. Multithreaded raw performance with 16C/32T and low power draw and in any memory sensitive app the 3D cache would pave over it. Let's see if they will make that one. Or even instead of 2 cpu chiplets they replace one of those with a GPU die that would also benefic greatly from the 3D cache.
Posted on Reply
#52
Denver
AnotherReaderSemiAnalysis' analysis of Zen 4c indicates that it's likely to clock lower than Zen 4.
"The silicon results show that the slow corner wafer was able to achieve 4.24GHz at 1.0V and 100 degree Celsius in 5nm FinFET technology"

It remains to be seen how this will change in 4nm. 4-4.2Ghz is good enough for laptops anyway
Posted on Reply
#53
Darmok N Jalad
Denver"The silicon results show that the slow corner wafer was able to achieve 4.24GHz at 1.0V and 100 degree Celsius in 5nm FinFET technology"

It remains to be seen how this will change in 4nm. 4-4.2Ghz is good enough for laptops anyway
And with P cores likely to ramp to 5 ghz, it’s all good. I’d suspect they’d just not push the C cores that hard if they want efficiency.
Posted on Reply
#54
zlobby
Nooooo! AMD are on the road to another Exynos...
Posted on Reply
#55
InVasMani
FrickBecause some stuff still benefits from bigger and faster cores, and beyond a certain number more cores might not increase performance. Depending on application 6 fast cores can be faster than 32 slow cores, but the opposite can also be true. For general purpose machines a hybrid approach makes sense, if nothing else because it'll help with power consumption/cooling.
I'd say a sort of pyramid type of scaling approach is most sensible. Slower and denser at the bottom faster and less dense at the top. The goal is better linear scaling of frequency and multi-thread background task handling in regard to power and heat along with die space area utilization.
Posted on Reply
#56
Denver
Darmok N JaladAnd with P cores likely to ramp to 5 ghz, it’s all good. I’d suspect they’d just not push the C cores that hard if they want efficiency.
Exactly, if everything is as good as it looks AMD will beat intel at its own game for the second time. The first time was when doing a more efficient implementation of the AVX-512.

.
Posted on Reply
#57
phanbuey
CammConsidering how awful Intel's thread director has been, it might actually be better to rely on the OS scheduler?

I'd also hope these are monolithic rather than chipset based. With the die size saving of 4c touted, 8 cores shouldn't be much larger than 4p cores.
You mean like how awful the 7950X3D's solution is and that's just for the same cores?
Posted on Reply
#58
Camm
phanbueyYou mean like how awful the 7950X3D's solution is and that's just for the same cores?
Would have preferred cache on both CCD's, but its a specialty SKU aimed at gamers rather than pervasive across the entire stack unlike Intel and at least are feature comparable unlike with e cores. Using a 7950X3D myself it took a few months after launch for it to bed down and now it just works on any games that aren't doing their own scheduling (on Windows admittedly due to the soft dependency on Xbox Games detection).

Going back to Intel for a sec, hardware scheduling could be best, but Thread Director just doesn't seem to work, especially on linux, which is problematic when almost every part is P+E.
Posted on Reply
#59
Minus Infinity
kongaCalling the compact cores "efficiency" cores seems to be off the mark. I don't believe these are designed to be much more power efficient. Perhaps their more compact nature may make them slightly more power efficient, but more than anything, they seem to be designed to be space-efficient instead. They should offer a very similar level of performance to normal cores in most applications while taking up around half as much die area only.
Agreed, there reson for existence was for cloud computing where more cores was more important than huge cache. AMD published some power figures and the 192 Zen 4c Bergamo seemed to have same TDP as 128 core Epyc. So efficency seems better even if that's not what was the driving force. If TSMC had gotten N3 working earlier we would see 256 core Bergamo at least for Turin.
Posted on Reply
#60
phanbuey
CammWould have preferred cache on both CCD's, but its a specialty SKU aimed at gamers rather than pervasive across the entire stack unlike Intel and at least are feature comparable unlike with e cores. Using a 7950X3D myself it took a few months after launch for it to bed down and now it just works on any games that aren't doing their own scheduling (on Windows admittedly due to the soft dependency on Xbox Games detection).

Going back to Intel for a sec, hardware scheduling could be best, but Thread Director just doesn't seem to work, especially on linux, which is problematic when almost every part is P+E.
Right, so 7950x3ds doesn't really work in Linux either -- point is, no matter how bad intels v1 and v2 hardware schedulers are, they at least exist and work most of the time. AMD's don't exist -- their split cache designs suffers because of it, still to this day it performs worse than the 7800x3d. Their e cores HAVE to have feature parity... And even then when they come out with their own weaker e cores, you will see some pretty wild performance issues.
Posted on Reply
#61
Camm
phanbueyRight, so 7950x3ds doesn't really work in Linux either -- point is, no matter how bad intels v1 and v2 hardware schedulers are, they at least exist and work most of the time. AMD's don't exist -- their split cache designs suffers because of it, still to this day it performs worse than the 7800x3d. Their e cores HAVE to have feature parity... And even then when they come out with their own weaker e cores, you will see some pretty wild performance issues.
Huh? You just get a slightly downclocked 7950X at worse.... totally overblown, lol.
Posted on Reply
#62
phanbuey
CammHuh? You just get a slightly downclocked 7950X at worse.... totally overblown, lol.
What part of that was overblown? that the 7950X3D performs worse than the 7800X3D... or that AMD doesn't have a hardware scheduler and is going to release CPUs with weaker c cores than even 7950X3D X3D vs non-X3D?
Posted on Reply
#63
Camm
phanbueyWhat part of that was overblown? that the 7950X3D performs worse than the 7800X3D... or that AMD doesn't have a hardware scheduler and is going to release CPUs with weaker c cores than even 7950X3D X3D vs non-X3D?
In gaming, when the scheduler doesn't work, and at worst you get a slightly downclocked full fat part. As opposed to 'if scheduler doesn't work here's your Atom cores'
Posted on Reply
#64
Mussels
Freshwater Moderator
AMD may not have these issues since the cores are almost identical other than cache size - there should be much less of a problem or delay when everythings the same, if something gets sent to the wrong cores for the task they can just shift to where they're needed as they can at the same speed of a regular dual CCX chip, rather than what intels suffering from where the CPU needs to process tasks to know where they belong, to send to the right cores. It's like a car towing itself.


Intels thread director was removed in 12th gen, to software

This led to some large latency issues on 12th and 13th gen - it's not present in long running tasks, but the initial choosing of which cores to throw a task on is much, much slower than previous intel hardware


His part two has benchmarks especially real world tasks like searching files in windows where things slow to a crawl as they're shoved onto the E-cores




That whole Nvidia DPC latency issue?


Windows 11 and it's optimisations for intels designs speeds that up fairly well (HAGS, i'm guessing) but it's still inferior to previous intel hardware by a large margin


However the tweaks they did to help the latency affect some tasks, dragging and dropping multiple music files takes a massive penalty in 11 vs 10 on these systems.
Likely because they're shoved to the slower E-cores, rather than the P cores.



Synthetic benchmarks are programmed into the software scheduler to help performance, but doing something like having a 3D rendering program open - not rendering anything but open, means the scheduler prioritises the P cores for that .exe and tasks like moving files around gets shoved to the E-cores, despite the P-cores having performance to spare and being free to the task.


This ends up in a weird situation where you lose the ability to multitask without large performance issues, and the biggest issues are the 'low priority' stuff that users actually deal with in real life, like copying files or importing video/mp3 files.


Didn't notice his part 3 video since the title changed, heres the link

Intel hired Jim Keller, the one who designed AMD's current hardware to fix this in their future designs.

TPU snuck into his video of intel mocking AMD's CCX design as glue, who then used slower glue :p


TL;DR: AMD better hope their design isn't this bad.

TL;DR2: These CPUs are being designed for enterprise work (for both AMD and Intel), and we get the leftovers. It doesn't always suit end users at home and they almost don't care since we aren't the target market - we're the secondary market. Look at GPU's with mining and AI, for example.
Posted on Reply
#65
zlobby
MusselsAMD may not have these issues since the cores are almost identical other than cache size - there should be much less of a problem or delay when everythings the same, if something gets sent to the wrong cores for the task they can just shift to where they're needed as they can at the same speed of a regular dual CCX chip, rather than what intels suffering from where the CPU needs to process tasks to know where they belong, to send to the right cores. It's like a car towing itself.


Intels thread director was removed in 12th gen, to software

This led to some large latency issues on 12th and 13th gen - it's not present in long running tasks, but the initial choosing of which cores to throw a task on is much, much slower than previous intel hardware


His part two has benchmarks especially real world tasks like searching files in windows where things slow to a crawl as they're shoved onto the E-cores




That whole Nvidia DPC latency issue?


Windows 11 and it's optimisations for intels designs speeds that up fairly well (HAGS, i'm guessing) but it's still inferior to previous intel hardware by a large margin


However the tweaks they did to help the latency affect some tasks, dragging and dropping multiple music files takes a massive penalty in 11 vs 10 on these systems.
Likely because they're shoved to the slower E-cores, rather than the P cores.



Synthetic benchmarks are programmed into the software scheduler to help performance, but doing something like having a 3D rendering program open - not rendering anything but open, means the scheduler prioritises the P cores for that .exe and tasks like moving files around gets shoved to the E-cores, despite the P-cores having performance to spare and being free to the task.


This ends up in a weird situation where you lose the ability to multitask without large performance issues, and the biggest issues are the 'low priority' stuff that users actually deal with in real life, like copying files or importing video/mp3 files.



TL;DR: AMD better hope their design isn't this bad.
Wow! I wasn't aware how bad the DPC was!
Posted on Reply
#66
phanbuey
zlobbyWow! I wasn't aware how bad the DPC was!
Now look up Zen DPC and prepare to be shocked.

The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
Posted on Reply
#67
zlobby
phanbueyNow look up Zen DPC and prepare to be shocked.

The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
Uhm, which Zen, to be precise?
Posted on Reply
#70
Minus Infinity
zlobbyNever Lenovo...
Well if you are prepared to go through 3 or 4 to get a properly working one and like to gamble on securities flaws being unpatched Lenoblo is a fine choice.
Tek-CheckIt's not necessary.
Well it's not necessary for desktop but Turin dense will be 192 Zen 5c cores (256 if TSMC get their 3nm node sorted in time to meet AMD's goals)
Posted on Reply
#71
zlobby
david salseroMy friends say that Lenovo has been working very well for 1-2 years, something else was 10-5 years ago, but now that it is among the companies that sells the most laptops, it has improved much more than ASUS and HP.
I am looking for a Zen 4 7040 Phoenix if possible without dGPU that you currently recommend I have seen the acer swift edge 16 (sfe16-43) www.techpowerup.com/309156/acer-announces-new-swift-edge-16-laptop-refreshed-with-amd-ryzen-7040-series-apus-and-wi-fi-7
LOL!

IDK about your friend, but I have personally used so many laptops, desktops and workstations, and I'm not touching Lenovo even if someone gives it to me for free.

If you want a better advice, tell us what are you going to use it for. This way we can point you better to a proper model for the task.
Posted on Reply
#72
Mussels
Freshwater Moderator
phanbueyNow look up Zen DPC and prepare to be shocked.

The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
I assume you arent talking about Zen3, cause mine makes their 10 series result look pretty bad



Skylake had weirdness with faulty timers, so its results may have been false. The bug affected the HEDT Skylake-X as well
The HPET bug: What it is and what it isn't - overclockers.at
Posted on Reply
#73
phanbuey
MusselsI assume you arent talking about Zen3, cause mine makes their 10 series result look pretty bad



Skylake had weirdness with faulty timers, so its results may have been false. The bug affected the HEDT Skylake-X as well
The HPET bug: What it is and what it isn't - overclockers.at
you're correct -- your single ccd-zen 3 idling for 2 mins in latency mon is not what i'm referring to.
Posted on Reply
#74
Mussels
Freshwater Moderator
phanbueyyou're correct -- your single ccd-zen 3 idling for 2 mins in latency mon is not what i'm referring to.
Zen 1 sure wasn't fast at anything but multithreading, that's for sure
Posted on Reply
#75
phanbuey
MusselsZen 1 sure wasn't fast at anything but multithreading, that's for sure
Any dual CCD/Mesh (Intel Skylake X etc.) system is still slow to this day -- this includes raptor lake.




Core-to-Core Latency - AMD Zen 4 Ryzen 9 7950X and Ryzen 5 7600X Review: Retaking The High-End (anandtech.com)

Combine that with nvidia drives, power savings plans, scaling core frequencies, DDR5 latency and more feature packed motherboards (AM5/1700) plus windows 11 security features and you can get some astronomical DPC latency. Won't notice it in normal use but it's there.

It seems like there's only X3D chips, and disabling E-cores or running VMs that can get scores like the older chips in terms of latency - alot of its is software and motherboard bios, not so much chip latency but i think it all adds up.
Posted on Reply
Add your own comment
Sep 26th, 2024 20:02 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts