Thursday, August 10th 2023

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Aug 10th, 2023 04:15 Discuss (86 Comments)

Beating previous reports that AMD is increasing the CPU core count of its mobile monolithic processors from the present 8-core/16-thread to 12-core/24-thread; we are learning that the next-gen processor from the company, codenamed "Strix Point," will in fact be the company's first hybrid processor. The chip is expected to feature two kinds of CPU cores, with "Zen 5" being the microarchitecture behind the performance cores, and "Zen 5c" behind the efficiency cores. An engineering sample featuring 4 P-cores, and 8 E-cores, surfaced on the web, thanks to Performancedatabases. A HWiNFO screenshot reveals the engineering sample's core-configuration of 4x P-cores and 8x E-cores, with identical L1 cache sizes. Things get a little fuzzy with the L2 cache size detection, and L3 cache.

We know from the current "Zen 4c" core design that it is essentially a compacted version of "Zen 4" designed for higher-density chiplets that have 16 cores; and that it has both the same ISA and IPC as "Zen 4," with the only difference being that "Zen 4c" is designed with lower amounts of shared L3 caches at their disposal, are generally configured with lower clock speeds, and have higher energy efficiency than "Zen 4." "Zen 4c" cores also 35% smaller in die-area than "Zen 4." The company could develop "Zen 5c" CPU cores with similar design goals.

The "Strix Point" silicon could hence have two CCX (CPU core complexes); one of which has the larger "Zen 5" P-cores and certain amount of L3 cache, and another CCX with the smaller "Zen 5c" cores, and their own L3 caches. This would essentially be similar to "Renoir," which has two 4-core CCXs of "Zen 2" cores. The L1 cache sizes for both kinds of cores is identical—48 KB L1D and 32 KB L1I, and it's likely that both core types have 1 MB of dedicated L2 caches per core. The L3 cache sizes could vary between the two CCXs, with the P-core CCX having 16 MB (4 MB per core), and the E-core CCX 8 MB (512 KB per core).

It would be interesting to imagine how AMD handles the hybrid architecture from a software standpoint. Intel uses Thread Director, a hardware-based solution that's designed to send the right kind of compute workload to the right kind of CPU core. AMD could either try to develop its own version of Thread Director, or use a less sophisticated OS-based solution such as what it's doing with its multi-CCD client processors.

Sources: Performancedatabases, IThome, VideoCardz

Add your own comment

86 Comments on AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

#26

ViperXZ

DavenAs far as I’m aware, ARM BIG.little cores are much different and not just lower clocks. For instance, A57 is out of order and A53 is in order pipelines. Can you point to product examples where BIG.little is all A53 or some other cores with different clocks?

i have a old tablet that only has the "cheap" cores. here you go ... cheaper phones/tablets

#27

Daven

AssimilatorIt's going to be a dual solution. Hardware for the more granular decisions, software (operating system) for more fine-grained.

No. In implementation the c-cores will have less L3 cache and be clustered more densely, which will undoubtedly negatively affect their performance characteristics. The fact that they're capability-identical to "big" Zen cores is irrelevant, unless you want to have a stupid e-peen war about whether this implementation is "better" than Intel's.

Yes.

The only difference is cache. Density is more manufacturing than core differences. Is a Celeron a different core than a Pentium due to cache differences? No, Intel gives them the same ‘cove’ codename. Cache differences have existed for a long time on the same cores for the sake of product differentiation.

An Intel p-core uses a ‘cove’ architecture. An Intel e-core uses a ‘mont’ architecture. Same goes for ARM SoCs. Different architectures for different on chip cores. AMD c and non-c cores are the same.

ViperXZi have a old tablet that only has the "cheap" cores. here you go ... cheaper phones/tablets

But that’s not a BIG.little hybrid design. That’s just little. We are arguing whether an AMD c and non-c mixture is really a hybrid design as the cores are essentially the same. I’m arguing that AMDs design is not BIG.little (two different core architectures) as Intel and ARM have defined it while others say cache and clock differences classify as hybrid and therefore still require sophisticated thread managers.

#28

ViperXZ

DavenYes.

The only difference is cache. Density is more manufacturing than core differences. Is a Celeron a different core than a Pentium due to cache differences? No, Intel gives them the same ‘cove’ codename. Cache differences have existed for a long time on the same cores for the sake of product differentiation.

An Intel p-core uses a ‘cove’ architecture. An Intel e-core uses a ‘mont’ architecture. Same goes for ARM SoCs. Different architectures for different on chip cores. AMD c and non-c cores are the same.

But that’s not a BIG.little hybrid design. That’s just little. We are arguing whether an AMD c and non-c mixture is really a hybrid design as the cores are essentially the same. I’m arguing that AMDs design is not BIG.little (two different core architectures) as Intel and ARM have defined it.

in this case i will agree, it's not a regular "hybrid" design, it is surely "big little" but just in a case of, less L3 cache and less clocks on the C cores, and thats it. So very very different compared to what ARM and Intel are doing (who are using different archs for the smaller / weaker cores).

#29

persondb

DavenAs far as I’m aware, ARM BIG.little cores are much different and not just lower clocks. For instance, A57 is out of order and A53 is in order pipelines. Can you point to product examples where BIG.little is all A53 or some other same cores with different clocks only?

MT6750, Helio X10, Helio P10, Snapdragon 430, Snapdragon 439, Snapdragon 615, ...

Those are very common.

Yes, ARM meant to pair different architectures(but with the same features implemented, just at different performance targets) for big.LITTLE, but a lot of manufacturers used the same core but implemented in different ways, so exactly like what AMD is doing here.

#30

Denver

AssimilatorIt's going to be a dual solution. Hardware for the more granular decisions, software (operating system) for more fine-grained.

No. In implementation the c-cores will have less L3 cache and be clustered more densely, which will undoubtedly negatively affect their performance characteristics. The fact that they're capability-identical to "big" Zen cores is irrelevant, unless you want to have a stupid e-peen war about whether this implementation is "better" than Intel's.

What are you talking about? all monolithic mobile chips already have half the cache of desktop versions since forever, so this point is not relevant. Both cores will be on cache parity. period.

Regarding the clock, we have a bergamo reaching 3.1Ghz even though it has 128c (It could be more due to TDP)

#31

Tek-Check

john_I prefer AMD's hybrid approach than Intel's, but I am speculating that AMD choose this to avoid having to design it's own hardware thread director like Intel. If they manage to build something like that, we might go the Intel way with a few P cores and a number of other less capable cores, clearly for marketing purposes(more cores in the same die area, more cores advertised, better sales). Intel is already moving in a three types of cores with it's next gen. AMD's denser cores aren't going to help much next year. They could have helped if they where ready for Alder Lake.

There is no "Intel way" here, as c-cores are far more capable than e-cores. Just look at Bergamo performance charts.
C-cores will definitely enable more flexible and diverse SKUs across mobility line-ups.

#32

ViperXZ

the intel way was terrible and a cope to bring them more cores since they couldnt design "full cores" that are efficient like amd does since zen. it lost them AVX512 + smt aside from a bios trick with early 12900Ks to reactivate AVX512 and also caused awkward software bugs and problems

#33

ratirt

Still not a fan of these Hybrid Pcore and Ecore although for a mobile market and maybe laptop it does make more sense.

#34

Tek-Check

DavenSo why not make all the cores 5c?

It's not necessary.

#35

ViperXZ

ratirtStill not a fan of these Hybrid Pcore and Ecore although for a mobile market and maybe laptop it does make more sense.

agreed, whereas with the amd design i like everything about it, since it will cause 0 problems

#36

john_

Tek-CheckThere is no "Intel way" here, as c-cores are far more capable than e-cores. Just look at Bergamo performance charts.
C-cores will definitely enable more flexible and diverse SKUs across mobility line-ups.

Anddddd.......I am sayingggggggg something different whereeeeeeeee????????????

You probably misunderstood what I wrote.

#37

Tek-Check

john_Anddddd.......I am sayingggggggg something different whereeeeeeeee????????????
You probably misunderstood what I wrote.

Relax dude. No reason to get triggered. I just added another comment to your speculation. Treat it as another brick in the wall, not as an opposition.

#38

AnarchoPrimitiv

persondbThat doesn't matter, we literally have no idea how high or low those cores will clock. I doubt it will be high at all. I am betting in the 2GHz to 3GHz range due to their far increased density.

Which then probably puts them in the same category of e-cores. As a lower performance core.

The thing really is that IPC never actually mattered, it's completely meaningless on it's own(and also varies far too much), if you need to do scheduling then it doesn't matter if the core has the same IPC or not, the only thing that matters is the core performance.

Bergamo's boost clocks are only 300Mhz below that of the 96 core epyc, and that's with 128 cores, so with a lot fewer cores, it's arguable that they won't be clocked too low, or at least they don't have to be. Seeing as Zen5 and Zen5C are the same architecture, same node, etc, I can't think of a reason why they couldn't theoretically reach the same or close enough clockspeeds to the the regular cores.

#39

Daven

persondbMT6750, Helio X10, Helio P10, Snapdragon 430, Snapdragon 439, Snapdragon 615, ...

Those are very common.

Yes, ARM meant to pair different architectures(but with the same features implemented, just at different performance targets) for big.LITTLE, but a lot of manufacturers used the same core but implemented in different ways, so exactly like what AMD is doing here.

Thanks for the examples. You are right. These SoCs use the same core but different clock targets. Very strange design. Seems like you just need the cores to boost up and down based on performance and power needs rather than artificially cap the clock speed of one core versus another.

Oh well, I learned something new.

#40

ViperXZ

DavenThanks for the examples. You are right. These SoCs use the same core but different clock targets. Very strange design. Seems like you just need the cores to boost up and down based on performance and power needs rather than artificially cap the clock speed of one core versus another.

Oh well, I learned something new.

this is the "cheap" approach. just lowering clocks gives you more efficiency and hence saves energy. the harder approach is using different cores, designing them, implementing them in software.

#41

john_

Tek-CheckRelax dude. No reason to get triggered. I just added another comment to your speculation. Treat it as another brick in the wall, not as an opposition.

I wasn't triggered. Your post looks like correcting me, so not another brick in the wall, more like a brick thrown at the wall :p

So I guess you misunderstood my post. It's fine.

#42

HD64G

DavenSo why not make all the cores 5c?

So why not make some SKUs with stacked cache and some without stacked cache? No need for ‘hybrid’ cores that are almost identical.

Agreed! I also wonder why to have hybrids. They could have stacked 3D cache on the Zen5c chiplet and not need the Zen5 cores at all.

#43

persondb

DavenThanks for the examples. You are right. These SoCs use the same core but different clock targets. Very strange design. Seems like you just need the cores to boost up and down based on performance and power needs rather than artificially cap the clock speed of one core versus another.

Oh well, I learned something new.

Some of those have likely different physical implementations(likely those that have 2ghz and 1ghz cores) which explains the whole thing as like with Zen4 and Zen4c, you can optimize it for area instead of performance(i.e. clocks).

This has a result that's basically the same as big.little which is... different performance cores. It doesn't really matter if the core has the same IPC or not really.

#44

mahirzukic2

HD64GAgreed! I also wonder why to have hybrids. They could have stacked 3D cache on the Zen5c chiplet and not need the Zen5 cores at all.

Sure, that would be:

firstly, cheaper to produce since they have smaller surface area, but since it will have more cores, the die area will remain the same, so not really cheaper to begin with
secondly, more expensive since you need to stack cache
so such design will be more expensive than one with regular Zen 5 cores with no cache stacking

Which means that it could turn out to be faster than pure Zen 5 cores (due to more cores) in some workloads, while in others obviously not (due to lower clocks due to stacked cache), all the while being more expensive.
Well if your use case is such that such configuration benefits it, you could still go for this kind of design even being more expensive, it could turn out to be cheaper per core.
But that's a BIG IF. And exactly a reason to have hybrids in the first place.

#45

Denver

HD64GAgreed! I also wonder why to have hybrids. They could have stacked 3D cache on the Zen5c chiplet and not need the Zen5 cores at all.

Optimization for high density has lower clocks as a weakness. Then you would be losing performance.

Besides being more expensive, another point is that the cache is stacked over the L3, effectively doubling it. But APUs only have half of the L3, so using 3D cache would only reach the same amount as desktop processors. add it all up and you will see that such a product would make no sense.

#46

pressing on

ToTTenTranzIsn't Phoenix2 also a hybrid 2×Zen4 + 4×Zen4c solution?
The presence of Performance and Efficiency cores are mentioned in AMD's PPR for the Phoenix APUs, which is why it's been assumed that Phoenix2 has a hybrid design.
Unless that's a mistake on their programming reference guide, Strix Point shouldn't be AMD's first hybrid design.

I think you're right. According to NotebookCheck the AMD Ryzen 3 7440U that was announced on May 23 this year is a Zen 4/Zen 4c hybrid.

To quote that source "(the 7440U) offers 4 cores (quad core) based on the Zend 4/Zen 4c architecture that supports hyperthreading (8 threads). The cores clock from 3 (base) up to 4.5 GHz (single core boost). The processor includes 4 MB L2 cache and 8 MB L3 cache. The chip is based on the smaller Phoenix2 series with two bigger Zen 4 cores and two smaller Zen 4c cores (with less cache)...".

#47

Panther_Seraphin

For mobile this makes perfect sense!

You are trying to maximise power and cooling requirements of each chassis and hopefully either a decent hardware scheduler or improvements to OS based scheduling will get the most benefits from this.

If this comes to desktop I can see it being viable in the areas of say business CPUs/Low end CPUs with intergrated GPUs. Lower power consumption with decent core counts.

DenverOptimization for high density has lower clocks as a weakness. Then you would be losing performance.

Besides being more expensive, another point is that the cache is stacked over the L3, effectively doubling it. But APUs only have half of the L3, so using 3D cache would only reach the same amount as desktop processors. add it all up and you will see that such a product would make no sense.

its also the fact that currently the 3d cache uses vias in the L3 cache to connect between the substrate and the 3d vcache so there is a requirement for the actual physical space the L3 takes up.

#48

CosmicWanderer

Give me a mobile APU with 6 Zen 5 cores, and a 30+ CU RDNA 3.5 GPU, call it the Ryzen Z2 Extreme and I'm sold on whatever handheld it ends up in.

#49

Darmok N Jalad

I think it’s too early to assume that AMD’s C cores can clock as high. If they crammed more of them in a tighter space, there may be some trade offs they had to make with the design in regards to total power consumption per core. Bergamo wasn’t designed for high speeds, but rather for more threads. It’s clocked lower because of the density of the chip and for the relatively low TDP target of the platform. Maybe they can clock them all the same, but it’s also possible that the C core design is not able to have as much power pushed through it, and it needs to sit closer to the optimum power/performance intersect. Zen4 is quite efficient, but AMD pushed the design past that for the sake of more multicore performance.

If I were to guess, I bet the C cores don’t boost as high, and might not exceed the “all core” rated speed.

#50

AnotherReader

Darmok N JaladI think it’s too early to assume that AMD’s C cores can clock as high. If they crammed more of them in a tighter space, there may be some trade offs they had to make with the design in regards to total power consumption per core. Bergamo wasn’t designed for high speeds, but rather for more threads. It’s clocked lower because of the density of the chip and for the relatively low TDP target of the platform. Maybe they can clock them all the same, but it’s also possible that the C core design is not able to have as much power pushed through it, and it needs to sit closer to the optimum power/performance intersect. Zen4 is quite efficient, but AMD pushed the design past that for the sake of more multicore performance.

If I were to guess, I bet the C cores don’t boost as high, and might not exceed the “all core” rated speed.

SemiAnalysis' analysis of Zen 4c indicates that it's likely to clock lower than Zen 4.

Add your own comment

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

86 Comments on AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Related News

86 Comments on AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts