• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces

I think it’s too early to assume that AMD’s C cores can clock as high. If they crammed more of them in a tighter space, there may be some trade offs they had to make with the design in regards to total power consumption per core. Bergamo wasn’t designed for high speeds, but rather for more threads. It’s clocked lower because of the density of the chip and for the relatively low TDP target of the platform. Maybe they can clock them all the same, but it’s also possible that the C core design is not able to have as much power pushed through it, and it needs to sit closer to the optimum power/performance intersect. Zen4 is quite efficient, but AMD pushed the design past that for the sake of more multicore performance.

If I were to guess, I bet the C cores don’t boost as high, and might not exceed the “all core” rated speed.
SemiAnalysis' analysis of Zen 4c indicates that it's likely to clock lower than Zen 4.
 
Sure, that would be:
  • firstly, cheaper to produce since they have smaller surface area, but since it will have more cores, the die area will remain the same, so not really cheaper to begin with
  • secondly, more expensive since you need to stack cache
  • so such design will be more expensive than one with regular Zen 5 cores with no cache stacking
Which means that it could turn out to be faster than pure Zen 5 cores (due to more cores) in some workloads, while in others obviously not (due to lower clocks due to stacked cache), all the while being more expensive.
Well if your use case is such that such configuration benefits it, you could still go for this kind of design even being more expensive, it could turn out to be cheaper per core.
But that's a BIG IF. And exactly a reason to have hybrids in the first place.

Optimization for high density has lower clocks as a weakness. Then you would be losing performance.

Besides being more expensive, another point is that the cache is stacked over the L3, effectively doubling it. But APUs only have half of the L3, so using 3D cache would only reach the same amount as desktop processors. add it all up and you will see that such a product would make no sense.
Cost could be higher, but witn 2 Zen5c chiplets and the 3D cache stacked on top, it would be the perfect workstation mobile CPU imho. Multithreaded raw performance with 16C/32T and low power draw and in any memory sensitive app the 3D cache would pave over it. Let's see if they will make that one. Or even instead of 2 cpu chiplets they replace one of those with a GPU die that would also benefic greatly from the 3D cache.
 
SemiAnalysis' analysis of Zen 4c indicates that it's likely to clock lower than Zen 4.
"The silicon results show that the slow corner wafer was able to achieve 4.24GHz at 1.0V and 100 degree Celsius in 5nm FinFET technology"

It remains to be seen how this will change in 4nm. 4-4.2Ghz is good enough for laptops anyway
 
"The silicon results show that the slow corner wafer was able to achieve 4.24GHz at 1.0V and 100 degree Celsius in 5nm FinFET technology"

It remains to be seen how this will change in 4nm. 4-4.2Ghz is good enough for laptops anyway
And with P cores likely to ramp to 5 ghz, it’s all good. I’d suspect they’d just not push the C cores that hard if they want efficiency.
 
Because some stuff still benefits from bigger and faster cores, and beyond a certain number more cores might not increase performance. Depending on application 6 fast cores can be faster than 32 slow cores, but the opposite can also be true. For general purpose machines a hybrid approach makes sense, if nothing else because it'll help with power consumption/cooling.

I'd say a sort of pyramid type of scaling approach is most sensible. Slower and denser at the bottom faster and less dense at the top. The goal is better linear scaling of frequency and multi-thread background task handling in regard to power and heat along with die space area utilization.
 
And with P cores likely to ramp to 5 ghz, it’s all good. I’d suspect they’d just not push the C cores that hard if they want efficiency.
Exactly, if everything is as good as it looks AMD will beat intel at its own game for the second time. The first time was when doing a more efficient implementation of the AVX-512.

.
 
Considering how awful Intel's thread director has been, it might actually be better to rely on the OS scheduler?

I'd also hope these are monolithic rather than chipset based. With the die size saving of 4c touted, 8 cores shouldn't be much larger than 4p cores.

You mean like how awful the 7950X3D's solution is and that's just for the same cores?
 
You mean like how awful the 7950X3D's solution is and that's just for the same cores?

Would have preferred cache on both CCD's, but its a specialty SKU aimed at gamers rather than pervasive across the entire stack unlike Intel and at least are feature comparable unlike with e cores. Using a 7950X3D myself it took a few months after launch for it to bed down and now it just works on any games that aren't doing their own scheduling (on Windows admittedly due to the soft dependency on Xbox Games detection).

Going back to Intel for a sec, hardware scheduling could be best, but Thread Director just doesn't seem to work, especially on linux, which is problematic when almost every part is P+E.
 
Calling the compact cores "efficiency" cores seems to be off the mark. I don't believe these are designed to be much more power efficient. Perhaps their more compact nature may make them slightly more power efficient, but more than anything, they seem to be designed to be space-efficient instead. They should offer a very similar level of performance to normal cores in most applications while taking up around half as much die area only.
Agreed, there reson for existence was for cloud computing where more cores was more important than huge cache. AMD published some power figures and the 192 Zen 4c Bergamo seemed to have same TDP as 128 core Epyc. So efficency seems better even if that's not what was the driving force. If TSMC had gotten N3 working earlier we would see 256 core Bergamo at least for Turin.
 
Would have preferred cache on both CCD's, but its a specialty SKU aimed at gamers rather than pervasive across the entire stack unlike Intel and at least are feature comparable unlike with e cores. Using a 7950X3D myself it took a few months after launch for it to bed down and now it just works on any games that aren't doing their own scheduling (on Windows admittedly due to the soft dependency on Xbox Games detection).

Going back to Intel for a sec, hardware scheduling could be best, but Thread Director just doesn't seem to work, especially on linux, which is problematic when almost every part is P+E.
Right, so 7950x3ds doesn't really work in Linux either -- point is, no matter how bad intels v1 and v2 hardware schedulers are, they at least exist and work most of the time. AMD's don't exist -- their split cache designs suffers because of it, still to this day it performs worse than the 7800x3d. Their e cores HAVE to have feature parity... And even then when they come out with their own weaker e cores, you will see some pretty wild performance issues.
 
Right, so 7950x3ds doesn't really work in Linux either -- point is, no matter how bad intels v1 and v2 hardware schedulers are, they at least exist and work most of the time. AMD's don't exist -- their split cache designs suffers because of it, still to this day it performs worse than the 7800x3d. Their e cores HAVE to have feature parity... And even then when they come out with their own weaker e cores, you will see some pretty wild performance issues.

Huh? You just get a slightly downclocked 7950X at worse.... totally overblown, lol.
 
Huh? You just get a slightly downclocked 7950X at worse.... totally overblown, lol.
What part of that was overblown? that the 7950X3D performs worse than the 7800X3D... or that AMD doesn't have a hardware scheduler and is going to release CPUs with weaker c cores than even 7950X3D X3D vs non-X3D?
 
What part of that was overblown? that the 7950X3D performs worse than the 7800X3D... or that AMD doesn't have a hardware scheduler and is going to release CPUs with weaker c cores than even 7950X3D X3D vs non-X3D?

In gaming, when the scheduler doesn't work, and at worst you get a slightly downclocked full fat part. As opposed to 'if scheduler doesn't work here's your Atom cores'
 
AMD may not have these issues since the cores are almost identical other than cache size - there should be much less of a problem or delay when everythings the same, if something gets sent to the wrong cores for the task they can just shift to where they're needed as they can at the same speed of a regular dual CCX chip, rather than what intels suffering from where the CPU needs to process tasks to know where they belong, to send to the right cores. It's like a car towing itself.


Intels thread director was removed in 12th gen, to software

This led to some large latency issues on 12th and 13th gen - it's not present in long running tasks, but the initial choosing of which cores to throw a task on is much, much slower than previous intel hardware


His part two has benchmarks especially real world tasks like searching files in windows where things slow to a crawl as they're shoved onto the E-cores

1691735062314.png



That whole Nvidia DPC latency issue?
1691735093728.png
1691735109313.png


Windows 11 and it's optimisations for intels designs speeds that up fairly well (HAGS, i'm guessing) but it's still inferior to previous intel hardware by a large margin
1691735272653.png


However the tweaks they did to help the latency affect some tasks, dragging and dropping multiple music files takes a massive penalty in 11 vs 10 on these systems.
Likely because they're shoved to the slower E-cores, rather than the P cores.
1691735407435.png



Synthetic benchmarks are programmed into the software scheduler to help performance, but doing something like having a 3D rendering program open - not rendering anything but open, means the scheduler prioritises the P cores for that .exe and tasks like moving files around gets shoved to the E-cores, despite the P-cores having performance to spare and being free to the task.


This ends up in a weird situation where you lose the ability to multitask without large performance issues, and the biggest issues are the 'low priority' stuff that users actually deal with in real life, like copying files or importing video/mp3 files.


Didn't notice his part 3 video since the title changed, heres the link

Intel hired Jim Keller, the one who designed AMD's current hardware to fix this in their future designs.

TPU snuck into his video of intel mocking AMD's CCX design as glue, who then used slower glue :p
1691735872387.png


TL;DR: AMD better hope their design isn't this bad.

TL;DR2: These CPUs are being designed for enterprise work (for both AMD and Intel), and we get the leftovers. It doesn't always suit end users at home and they almost don't care since we aren't the target market - we're the secondary market. Look at GPU's with mining and AI, for example.
 
Last edited:
AMD may not have these issues since the cores are almost identical other than cache size - there should be much less of a problem or delay when everythings the same, if something gets sent to the wrong cores for the task they can just shift to where they're needed as they can at the same speed of a regular dual CCX chip, rather than what intels suffering from where the CPU needs to process tasks to know where they belong, to send to the right cores. It's like a car towing itself.


Intels thread director was removed in 12th gen, to software

This led to some large latency issues on 12th and 13th gen - it's not present in long running tasks, but the initial choosing of which cores to throw a task on is much, much slower than previous intel hardware


His part two has benchmarks especially real world tasks like searching files in windows where things slow to a crawl as they're shoved onto the E-cores

View attachment 308477


That whole Nvidia DPC latency issue?
View attachment 308478View attachment 308479

Windows 11 and it's optimisations for intels designs speeds that up fairly well (HAGS, i'm guessing) but it's still inferior to previous intel hardware by a large margin
View attachment 308481

However the tweaks they did to help the latency affect some tasks, dragging and dropping multiple music files takes a massive penalty in 11 vs 10 on these systems.
Likely because they're shoved to the slower E-cores, rather than the P cores.
View attachment 308483


Synthetic benchmarks are programmed into the software scheduler to help performance, but doing something like having a 3D rendering program open - not rendering anything but open, means the scheduler prioritises the P cores for that .exe and tasks like moving files around gets shoved to the E-cores, despite the P-cores having performance to spare and being free to the task.


This ends up in a weird situation where you lose the ability to multitask without large performance issues, and the biggest issues are the 'low priority' stuff that users actually deal with in real life, like copying files or importing video/mp3 files.



TL;DR: AMD better hope their design isn't this bad.
Wow! I wasn't aware how bad the DPC was!
 
Wow! I wasn't aware how bad the DPC was!
Now look up Zen DPC and prepare to be shocked.

The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
 
Never Lenovo...
Well if you are prepared to go through 3 or 4 to get a properly working one and like to gamble on securities flaws being unpatched Lenoblo is a fine choice.

It's not necessary.
Well it's not necessary for desktop but Turin dense will be 192 Zen 5c cores (256 if TSMC get their 3nm node sorted in time to meet AMD's goals)
 
My friends say that Lenovo has been working very well for 1-2 years, something else was 10-5 years ago, but now that it is among the companies that sells the most laptops, it has improved much more than ASUS and HP.
I am looking for a Zen 4 7040 Phoenix if possible without dGPU that you currently recommend I have seen the acer swift edge 16 (sfe16-43) https://www.techpowerup.com/309156/...d-with-amd-ryzen-7040-series-apus-and-wi-fi-7
LOL!

IDK about your friend, but I have personally used so many laptops, desktops and workstations, and I'm not touching Lenovo even if someone gives it to me for free.

If you want a better advice, tell us what are you going to use it for. This way we can point you better to a proper model for the task.
 
Now look up Zen DPC and prepare to be shocked.

The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
I assume you arent talking about Zen3, cause mine makes their 10 series result look pretty bad
Screenshot 2023-07-20 105040.png



Skylake had weirdness with faulty timers, so its results may have been false. The bug affected the HEDT Skylake-X as well
The HPET bug: What it is and what it isn't - overclockers.at
 
you're correct -- your single ccd-zen 3 idling for 2 mins in latency mon is not what i'm referring to.
Zen 1 sure wasn't fast at anything but multithreading, that's for sure
 
Back
Top