Thursday, November 2nd 2023

AMD Introduces Ryzen 5 and Ryzen 3 Mobile Processors with "Zen 4c" Cores

AMD today launched its first client processors that feature the compact "Zen 4c" CPU cores, with the Ryzen 5 7545U and Ryzen 3 7440U mobile processors for thin-and-light notebooks. The "Zen 4c" CPU core is a compacted version of the "Zen 4" core without the subtraction of any hardware components, but rather a high density arrangement of them on the 4 nm silicon. A "Zen 4c" core is around 35% smaller in area on the die than a regular "Zen 4" core. Since none of its components is removed, the core features an identical IPC (single thread performance) to "Zen 4," as well as an identical ISA (instruction set). "Zen 4c" also supports SMT or 2 threads per core. The trade-off here is that "Zen 4c" cores are generally clocked lower than "Zen 4" cores, as they can operate at lower core voltages. This doesn't, however, make the "Zen 4c" comparable to an E-core by Intel's definition, these cores are still part of the same CPU clock speed band as the "Zen 4" cores, at least in the processors that's being launched today.

The Ryzen 5 7545U and Ryzen 3 7440U mobile processors formally debut the new 4 nm "Phoenix 2" monolithic silicon. This chip is AMD's first hybrid processor, in that it has a mixture of two regular "Zen 4" cores, and four compact "Zen 4c" cores. The six cores share an impressive 16 MB of L3 cache. All six cores feature 1 MB of dedicated L2 cache. There is no complex hardware-based scheduler involved, but a software based solution that's deployed by AMD's Chipset Software, which tells the Windows scheduler to see the "Zen 4" cores as UEFI CPPC "preferred cores," and prioritize traffic to them, as they can hold on to higher boost frequency bins. The "Phoenix 2" silicon inherits much of the on-die power-management feature-set from the "Phoenix" and "Rembrandt" chips, and so are capable of a high degree of power savings with underutilized CPU cores and iGPU compute units.
The iGPU of "Phoenix 2" is based on the latest RDNA3 graphics architecture, however it is not exactly built for gaming, it has just enough muscle for a modern Windows 11 experience with animated UI, complex web-pages, and streaming video based on the latest AV1 and HEVC formats. The iGPU packs 4 compute units that add up to 256 stream processors. There are also AI accelerators that are intrinsic to the RDNA3 compute units. A big change with "Phoenix 2" is that it lacks the 16 TOPS XDNA accelerator, and hence both the processor models being launched today lack Ryzen AI.

As for the chips themselves, the Ryzen 5 7545U maxes out the "Phoenix 2" silicon, enabling all two "Zen 4" and all four "Zen 4c" cores; along with 16 MB of L3 cache, 22 MB of "total cache" (L2+L3), a CPU clock speed of 3.20 GHz base, and 4.90 GHz boost; and a TDP band of 15 W to 30 W. The Ryzen 3 7440U, on the other hand, is a quad-core chip, enabling both "Zen 4" cores, and two out of the four available "Zen 4c" cores. The shared L3 cache is reduced to 8 MB, and hence the total cache is down to 12 MB. The CPU is clocked at 3.00 GHz, with 4.70 GHz maximum boost. The TDP band remains 15 W to 30 W. Both models max out the iGPU with its available 4 compute units; which has the branding Radeon 740M.
Notebooks based on the Ryzen 5 7545U and Ryzen 3 7440U should begin rolling out now, the two chips mostly cater to entry/mainstream variants of existing thin-and-lights, including some notebooks with mainstream thickness.

The complete press-deck follows.
Add your own comment

54 Comments on AMD Introduces Ryzen 5 and Ryzen 3 Mobile Processors with "Zen 4c" Cores

#26
Count von Schwalbe
Nocturnus Moderatus
MxPhenom 216It is not a hard concept.

Zen 4c are fabbed on TSMC 4nm which is a density optimized 5nm. Zen 4c cores also have half the L3 cache the normal zen4 cores have. Cache takes up a lot of area.

EDIT: Theres conflicting articles going around, some say they are on 4nm. Some say its still 5nm. Either way, majority of that area improvement is reduction in cache. Clock freq dont impact area much, so the reduction in clk freq is not really contributing to area reductions.

The library AMD is using is also different with density being priority so everything in that library can be packed closer, routing layers routed closer together via rules, etc.
This is a lot more than just cache cuts. Using different sub-variants of a process, with a slightly different gate design, they can adjust power usage and area used vs speed cap pretty far. Example below is for an ARM core and a different process (N3E), but the principle is similar:

Check out the picture below as well:

They removed the specific boundaries between certain areas - reducing dead space. The lower heat and the reduced interference from reduced clocks makes that easier, but it is mostly just taking a page from the ARM design book. An area where extremely reduced power budgets and thermal constraints are common - servers and smartphones.

They also reduced the SRAM size by using a different type of cell, at the potential cost of bandwidth or latency.

Source: www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale

Incidentally, it appears that the L3 cache per core is actually the same as a standard Zen 4 APU.
Posted on Reply
#27
MxPhenom 216
ASIC Engineer
Count von SchwalbeThis is a lot more than just cache cuts. Using different sub-variants of a process, with a slightly different gate design, they can adjust power usage and area used vs speed cap pretty far. Example below is for an ARM core and a different process (N3E), but the principle is similar:

Check out the picture below as well:

They removed the specific boundaries between certain areas - reducing dead space. The lower heat and the reduced interference from reduced clocks makes that easier, but it is mostly just taking a page from the ARM design book. An area where extremely reduced power budgets and thermal constraints are common - servers and smartphones.

They also reduced the SRAM size by using a different type of cell, at the potential cost of bandwidth or latency.

Source: www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale

Incidentally, it appears that the L3 cache per core is actually the same as a standard Zen 4 APU.
That is what i said about the library AMD is using for this.

Its essentially a big re-floorplanning using a new library with cells, rules, memory compilers (which is what creates the SRAM being used, that can be pulled from Library) all being designed with higher density in mind.

L3 cache for Zen4c core is 1MB, vs 2MB on a normal Zen4, tho correcting what i said, i dont think L3 sits in a Zen core so it wouldnt contribute to area reductions

Capacity for L2/L1 didnt change but the memory design for it did which coinsides with library updates
Posted on Reply
#28
Count von Schwalbe
Nocturnus Moderatus
MxPhenom 216L3 cache for Zen4c core is 1MB, vs 2MB on a normal Zen4.
2MB per core, same as an APU/monolithic chip. 4MB for normal Zen 4.
MxPhenom 216That is what i said about the library AMD is using for this.
My bad, missed this.
Posted on Reply
#29
Squared
MxPhenom 216It is not a hard concept.

Zen 4c are fabbed on TSMC 4nm which is a density optimized 5nm. Zen 4c cores also have half the L3 cache the normal zen4 cores have. Cache takes up a lot of area.

EDIT: Theres conflicting articles going around, some say they are on 4nm. Some say its still 5nm. Either way, majority of that area improvement is reduction in cache. Clock freq dont impact area much, so the reduction in clk freq is not really contributing to area reductions.

The library AMD is using is also different with density being priority so everything in that library can be packed closer, routing layers routed closer together via rules, etc.
In this case, the Zen 4 and Zen 4c cores are on the same die, so they're built with same node.

The high-density layout library means lower maximum clock speeds but also lower power consumption.

During development Zen 4 was divided into many sections so that each team could work on a section without running into another team's space, hence some of the deadspace Zen 4c can eliminate.

The blurring together of sections makes me think that Zen 4c can't start development until Zen 4 is almost finished. If that holds true for Zen 5c and Zen 5, then Zen 5 will beat Zen 5c to market, which means the first products to market can't have a hybrid architecture.
Posted on Reply
#30
Count von Schwalbe
Nocturnus Moderatus
SquaredThe blurring together of sections makes me think that Zen 4c can't start development until Zen 4 is almost finished. If that holds true for Zen 5c and Zen 5, then Zen 5 will beat Zen 5c to market, which means the first products to market can't have a hybrid architecture.
I am sure it could be done, but it would be a massive savings to do it the way you said.
Posted on Reply
#31
MxPhenom 216
ASIC Engineer
SquaredIn this case, the Zen 4 and Zen 4c cores are on the same die, so they're built with same node.

The high-density layout library means lower maximum clock speeds but also lower power consumption.

During development Zen 4 was divided into many sections so that each team could work on a section without running into another team's space, hence some of the deadspace Zen 4c can eliminate.

The blurring together of sections makes me think that Zen 4c can't start development until Zen 4 is almost finished. If that holds true for Zen 5c and Zen 5, then Zen 5 will beat Zen 5c to market, which means the first products to market can't have a hybrid architecture.
I mean thats not really how designing this stuff works. Especially if designed hierarchically with hardmacs, etc. which is how all VLSI design is these days.

Usually top level floorplan is put together from initial designs (netlists) and shapes of everything and where eveyrthing goes is set with a target total area. Everything then gets fit into that area and all the teams work within the parameters they are given (which is usually a rectilinear shape that their design fits into). The initial floorplanning usually provides margin for anything that could take of area down the line. There is no way for a team to "run into anothers space"

When and if efforts to reduce area are done, then all the worst offenders are usually the ones targeted first for reductions, especially those with extra space, and shapes of hardmacs, etc change to cut area.

There is also possibilities of things being taken out of the hardmacs and put into the top level instead, but that rarely happens.
Count von SchwalbeThis is a lot more than just cache cuts. Using different sub-variants of a process, with a slightly different gate design, they can adjust power usage and area used vs speed cap pretty far. Example below is for an ARM core and a different process (N3E), but the principle is similar:

Check out the picture below as well:

They removed the specific boundaries between certain areas - reducing dead space. The lower heat and the reduced interference from reduced clocks makes that easier, but it is mostly just taking a page from the ARM design book. An area where extremely reduced power budgets and thermal constraints are common - servers and smartphones.

They also reduced the SRAM size by using a different type of cell, at the potential cost of bandwidth or latency.

Source: www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale

Incidentally, it appears that the L3 cache per core is actually the same as a standard Zen 4 APU.
Did the "boundaries" get removed, or did the shapes of specific parts of the design change to fit into the smaller die area better? Just from experience im inclined to think its the latter. Its possible a lot of the previously hardmacro designs were moved/added to another hardmac that was already there, and then the shape adjusted to fit into the area.
Posted on Reply
#32
Count von Schwalbe
Nocturnus Moderatus
MxPhenom 216Did the "boundaries" get removed, or did the shapes of specific parts of the design change to fit into the smaller die area better? Just from experience im inclined to think its the latter. Its possible a lot of the previously hardmacro designs were moved/added to another hardmac that was already there, and then the shape adjusted to fit into the area.
From that link:
AMD created Zen 4c by taking the exact same Zen 4 Register-Transfer Level (RTL) description, which describes the logical design of the Zen 4 core IP, and implementing it with a far more compact physical design. The design rules are the same as both are on TSMC N5, yet the area difference is massive.
They show an ARM core without partitions for reference


Zen 4c looks so different due to a flatter design hierarchy with fewer partitions. With such complex core designs with several hundred million transistors, it makes sense to split the core up into distinct regions in a floorplan so that designers and simulation tools can work in parallel to speed up Time to Market (TTM). Any engineering changes to a circuit can also be isolated to a sub-region without having to redo the placement and routing process for the whole core.
Intentionally separating timing critical regions can also help with routing congestion and achieving higher clock speeds from less interference. We see ARM’s Neoverse V1 and Cortex-X2 cores without hard partitions between logical regions, with placement packed as tight as possible. The regions appear homogenous when looking at the physical die. On the other hand, we see Intel’s Crestmont E-core with many visible partitions, with the boundaries highlighted in purple.
As seen in our Zen 4 core annotations, there are numerous partitions for each logical block within the core, but this is drastically reduced in Zen 4c with just 4 partitions (L2, Front End, Execution, FPU). By merging those partitions from Zen 4, the regions can be packed closer together, adding another avenue of area saving by further boosting standard cell density. One can say that AMD’s Zen 4c ‘looks like an ARM Core’.
Posted on Reply
#33
MxPhenom 216
ASIC Engineer
Yes, flatter design hierarchy, the partitions they are talking about is hardmacs. There is still hardmacs in the design, theres just less of them. They essentially stuffed more into major designs to reduce how many hardmacs there are, by flattening some of the designs.
Posted on Reply
#34
Dr. Dro
DavenAMD is finally putting the nail in the coffin of those calling Zen 4c e-cores with this slide:


Zen 4 and Zen 4c are BOTH high performance cores and are BOTH designed for high efficiency.
Which is funny since that slider is lying and distorting reality.

Since Raptor Lake, both cores have equivalent instruction sets (and were intended to in Alder as well), and a hardware thread scheduler improves performance: it's a disadvantage for AMD that they do not have one.

Dishonest marketing for a dishonest company, but people will always excuse them...
Posted on Reply
#35
qcmadness
Dr. DroWhich is funny since that slider is lying and distorting reality.

Since Raptor Lake, both cores have equivalent instruction sets (and were intended to in Alder as well), and a hardware thread scheduler improves performance: it's a disadvantage for AMD that they do not have one.

Dishonest marketing for a dishonest company, but people will always excuse them...
You messed up with the whole issue.

At same clock, Zen 4 and Zen 4c cores are identical in performance.
At same clock, Intel P-cores and E-cores are not identical in performance.

That's the difference and why Intel needed the Thread Director.
Windows could automatically assign works to highest frequency cores where it assumes to be the highest performance.
Posted on Reply
#36
BrainCruser
DenverIt would be valuable to have information on the Zen4c's maximum clockrate before it exceeds the optimal efficiency threshold. :)
Around Zen2, 3.2-4GHz is their optimal range.
Posted on Reply
#37
Denver
BrainCruserAround Zen2, 3.2-4GHz is their optimal range.
I was imagining about 3.5ghz maximum. If they manage to maintain close to 4ghz it will be excellently, aligned with the clockrate that low TDP designs maintain on MT workloads.
Posted on Reply
#38
unwind-protect
BTW, Intel's thread director is not scheduling threads on its own to certain cores. It is just advising the software scheduler.
Posted on Reply
#39
Squared
I don't think AMD needs something like Thread Director for Phoenix 2, because the frequency curves of Zen 4 and Zen 4c are very similar. If the threads are assigned to the least optimal cores it will only have a small impact on efficiency. Raw performance could suffer more but probably not because the "cloud" cores won't have a heavy load unless many cores are busy, in which case the whole chip will clock down anyway. And if only one or two cores are busy, it'll be the Zen 4 cores because they're the preferred cores and operating systems have been working with that concept for a few years now.
Posted on Reply
#40
AnotherReader
Dr. DroWhich is funny since that slider is lying and distorting reality.

Since Raptor Lake, both cores have equivalent instruction sets (and were intended to in Alder as well), and a hardware thread scheduler improves performance: it's a disadvantage for AMD that they do not have one.

Dishonest marketing for a dishonest company, but people will always excuse them...
AMD is technically right as Golden Cove has AVX-512 support which is fused off. Earlier samples had it disabled via firmware, but older BIOS versions kept it functional. However, they should have omitted that point as from the perspective of the OS and any applications, the P and E cores support the same instructions.
Posted on Reply
#41
Dr. Dro
AnotherReaderAMD is technically right as Golden Cove has AVX-512 support which is fused off. Earlier samples had it disabled via firmware, but older BIOS versions kept it functional. However, they should have omitted that point as from the perspective of the OS and any applications, the P and E cores support the same instructions.
It's a dishonest comparison and targeted at an earlier generation product. Despite having the support fused off, it was never intended to be present in Alder as a product. Also, the older MCU (not exactly BIOS, just MCU) enables AVX-512 only on the earlier batches of 12900K, newer models and the i9-12900KS (all chips) have it physically disabled.

It was never applicable against Raptor Lake to begin with.
SquaredI don't think AMD needs something like Thread Director for Phoenix 2, because the frequency curves of Zen 4 and Zen 4c are very similar. If the threads are assigned to the least optimal cores it will only have a small impact on efficiency. Raw performance could suffer more but probably not because the "cloud" cores won't have a heavy load unless many cores are busy, in which case the whole chip will clock down anyway. And if only one or two cores are busy, it'll be the Zen 4 cores because they're the preferred cores and operating systems have been working with that concept for a few years now.
Frequency is not the crux of the matter here but rather topology.
qcmadnessAt same clock, Zen 4 and Zen 4c cores are identical in performance.
At same clock, Intel P-cores and E-cores are not identical in performance.

That's the difference and why Intel needed the Thread Director.
Windows could automatically assign works to highest frequency cores where it assumes to be the highest performance.
They are not identical in performance because of the cache size mismatch. A thread director would be useful even for hybrid chips like the 7950X3D which currently rely on a custom scheduler driver.

Intel's E-cores are derived from a completely different architecture, yes.
Posted on Reply
#42
Count von Schwalbe
Nocturnus Moderatus
Dr. DroThey are not identical in performance because of the cache size mismatch. A thread director would be useful even for hybrid chips like the 7950X3D which currently rely on a custom scheduler driver.
There is no cache size mismatch. Both mobile Zen 4 and mobile Zen 4c use 2MB of L3 per core, and the L1 and L2 are identical.

The only practical difference is the clockspeed. This is identical to setting preferred CPU cores, like standard desktop models.
Posted on Reply
#43
Dr. Dro
Count von SchwalbeThere is no cache size mismatch. Both mobile Zen 4 and mobile Zen 4c use 2MB of L3 per core, and the L1 and L2 are identical.

The only practical difference is the clockspeed. This is identical to setting preferred CPU cores, like standard desktop models.
Perhaps in this case, but it's similar to Cezanne and Renoir I suppose. They too didn't have the full capacity L3 as Vermeer and Matisse did.
Posted on Reply
#44
Count von Schwalbe
Nocturnus Moderatus
Dr. DroPerhaps in this case, but it's similar to Cezanne and Renoir I suppose. They too didn't have the full capacity L3 as Vermeer and Matisse did.
Indeed. I simply mentioned it as the entire news article is about mobile processors.

I doubt Zen 4c will ever make it to desktop, but maybe Zen 5c will.
Posted on Reply
#45
AnotherReader
Dr. DroIt's a dishonest comparison and targeted at an earlier generation product. Despite having the support fused off, it was never intended to be present in Alder as a product. Also, the older MCU (not exactly BIOS, just MCU) enables AVX-512 only on the earlier batches of 12900K, newer models and the i9-12900KS (all chips) have it physically disabled.

It was never applicable against Raptor Lake to begin with.



Frequency is not the crux of the matter here but rather topology.



They are not identical in performance because of the cache size mismatch. A thread director would be useful even for hybrid chips like the 7950X3D which currently rely on a custom scheduler driver.

Intel's E-cores are derived from a completely different architecture, yes.
I agree that while the point is technically correct, it's rather misleading, because for the OS and applications, there's no difference in the ISA of Golden Cove and Gracemont. Furthermore, as you pointed out, for Raptor Lake and later Alder Lake SKUs, there was no disparity as the instruction support was fused off. Still, I think you're being rather uncharitable in focusing on one part of a slide from Marketing and ignoring the technical excellence of Zen 4c. Marketing is known for being frugal with the truth across most industries and companies.
Posted on Reply
#46
R0H1T
Dr. Droand a hardware thread scheduler improves performance:
There's no evidence for that, an OS scheduler or program like process lasso could work just as well. It's just quicker that's for sure.
Dr. DroDishonest marketing for a dishonest company,
Meh if we're pedantic you can probably claim everything on a marketing slide as a lie ~ from every profit making company out there!
Posted on Reply
#47
THU31
Since they're already doing chiplets, it would be cool if they did an 8+16 desktop CPU.

I just hope they never use this in CPUs with 8 cores or fewer on desktop.
Posted on Reply
#48
Count von Schwalbe
Nocturnus Moderatus
THU31Since they're already doing chiplets, it would be cool if they did an 8+16 desktop CPU.

I just hope they never use this in CPUs with 8 cores or fewer on desktop.
And the 8 core chiplet with V-cache....

Take both gaming and productivity crowns with a 48-thread desktop CPU.
Posted on Reply
#49
NeuralNexus
dj-electricZen 4c are no E-cores becuase they are not E-cores, from both sides of the argument. They are a slightly more efficient Zen 4 core at particular frequency and power range, as described in the slides.
E-cores are just SkyLake-based compute cores that have been incorporated into the hybrid INTEL architecture. The argument is mute because the difference between a Performance and Efficient core isn't that big to dismiss what AMD is doing in a similar nature.
Posted on Reply
#50
Squared
NeuralNexusE-cores are just SkyLake-based compute cores that have been incorporated into the hybrid INTEL architecture. The argument is mute because the difference between a Performance and Efficient core isn't that big to dismiss what AMD is doing in a similar nature.
Intel's E-cores aren't related to Skylake. Gracemont was compared to Skylake by Intel and reviewers when Alder Lake came out because it's almost as performant as Skylake.

The lineage of Intel and AMD's cores is probably why Intel has a dedicated E-core architecture and AMD does not. Bulldozer, Saltwell, Sandy Bridge, and Bobcat; long ago the four CPU microarchitectures lived together in harmony. But everything changed when Sandy Bridge came to market. Bobcat was AMD's first generation little core, the second generation was Jaguar. But when the time came for a third generation AMD had a tiny development budget and the Bulldozer line was going into cheaper and cheaper devices, so the last microarchitecture in its line, Excavator, was actually lean enough to replace the Bobcat line, which AMD did with the release of the Stony Ridge APU.

Since then AMD's Zen line has always had to fill the needs of every CPU AMD sells. But Intel has been updating their little microarchitecture this whole time, so Intel has the flexibility to use both big and little cores. In fact I think the big and little cores Intel used in Lakefield were probably in development before Intel decided to make Lakefield. AMD has only one core today but the density-optimized layout reminds me of Excavator, so AMD's fastest way to make a little core was to density-optimize Zen 4. The fact that TSMC makes AMD processors and ARM processors probably means that AMD has access to better density-optimizing tools than Intel does.

But this could all change tomorrow. Intel is building processors for ARM customers, so Intel has to be investing into better density-optimizing tools. And AMD has more than enough money to build dedicated big and little cores. In the future their approach to big and little cores may look more alike the other.
Posted on Reply
Add your own comment
Dec 18th, 2024 03:22 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts