Wednesday, June 14th 2023
AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
7850X could exist for those who need MT and 7800X3D exists for those who want the best gaming performance.
I would hate 7800X3D not existing.
On the other hand, Intel's Redwood Cove P cores and Crestmont E cores will be released on the same day when Meteor Lake comes to market. If AMD wants to build a hybrid processor like this, it's probably either going to be six months later than a non-hyrbrid, or AMD will need a different kind of small core that can be developed concurrently with Zen.
IMHO Ryzen 9 should have all been 16 core cpus, with Ryzen 7 being 12 core cpus, Ryzen 5 being 8 core cpus, and Ryzen 3 being 6 core cpus. Sell the cheapest 12 core for around $450 with the cheapest 8 core at $300 and the cheapest 6 core at $200 and lets say $600 for the cheapest 16 core cpu. This would have done a much better job keeping core counts and overall performance much closer when comparing Intel and AMD offerings this time around. Though considering that right now you can get a 7600 for $223, a 7700 for $326, a 7900 for $420, and a 7950X for $576 dollars in the US the pricing is almost there anyways.
www.techpowerup.com/cpu-specs/?mfgr=AMD&released=2022&nCores=2&sort=nameIt will eventually happen as they move almost completely from 7/6nm parts but probably still a couple of years away.
Just like the lag between the 5800X and the 5800X3D was longer than the lag between 7700X and 7800X3D I expect for Zen 5 it will initially be a 16c 2 CCD design and a few months later once the Zen 5c design is finished they can launch a 24c SKU.
In theory they could release a 24c Zen 4 SKU as well, just depends on if there is a business case for such a thing. Personally I think there might be if it gets people to jump onto AM5 before Arrow Lake launches which should improve uptake of Zen 5 onwards.
EDIT: wait, that might just be l1/l2 though. Now I am questioning my logic.
If L3 had a significant impact on per-core power, I also think we would be able to tell a difference by now between half and full L3 and between full and X3D. Core power is roughly similar at the end of the day.
Gracemont is 1.70mm^2 without L2$, a block of 4 with 4MB L2$ cache is 8.78mm^2 (note the image incorrectly states 2MB for L2$)
Zen 4 with 1MB L2$ is 3.84mm^2
Zen 4C with 1MB L2$ is 2.48mm^2, we can estimate a block of 4 with 4MB L2$ to be about 9.9mm^2 unless ive butchered the math
so it really doesn't matter how many zen4c cores can pack into the area of a zen4 core because intel is getting thrashed in area efficiency.
kinda find it funny this was likely commented in bad faith as you had the numbers to work out they could fit like 1.5 Z4C cores in a Z4 core. which must be an own because Intel could fit 4 cripple cores in the same area as their performance core. turns out Intels P core has awful area efficiency and honestly its the same with their E core, when you consider Z4C, is the same RTL design as Z4 and is about the same size as a Gracemont cluster.