Wednesday, June 14th 2023
AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
AMD already has a 16-core mobile CPU as 7945HX - www.amd.com/en/product/13016 - with a slightly higher TDP range though. the lower end of the range at 55W should match pretty well to your 45W idea given the inherent efficiency handicap from using chiplets. Granted, the laptops with it usually run something like 88W power limit but surely that is configurable.
Regarding running stuff at more optimized settings - my post might have come out more critical than intended the same line of thought has been on my mind quite a number of times. I am currently running a 5800X3D limited to 76W and with -30 curve optimizer negative offset.
www.phoronix.com/review/amd-epyc-9754-bergamo
9754 is 128c/256t at 2.25/3.1/3.1 GHz
9654 is 96c/192t at 2.4/3.55/3.7 GHz
Those few hundred MHz alone make a noticeable difference in efficiency.
More cores at lower clocks will be more efficient. The details on that are hard to see from a general result like that - plus, we do not really know the clocks distribution across tests. For example, look at 9654 and 9554 in the same lineup where the former is ~20% faster at the same power draw. It is not quite the same level of difference as 9745 vs 9654 but still a noticeable efficiency difference (also, both of these are probably running at 3+GHz but we do not know exactly).
9654 is 96c/192t at 2.4/3.55/3.7 GHz
9554 is 64c/128t at 3.1/3.75/3.75 GHz
Edit:
I am not saying that Zen4c is not more efficient but this is not the data point that would show that in any clear manner, much less getting some idea how much more efficient.
Since it seems as if Zen4c cores use 20-30% less power at the same core count, if they added 3dvcache to such a processor, it would result in 1,5 times the cache and better power efficiency which would appeal to everyone, but in order to work better than the 7900x3d and 7950x3d have resulting in worse perfomance in games than the 7800x3d, they should put it on both ccd`s. I doubt it would happen with Zen4 but rather with Zen5, thus ill use 8000 naming in my examples.
Example( pure fiction ) (i have gone with 30% less power drawn)
- Ryzen 5
- Ryzen 7 (700&800)
- Ryzen 9 ( im only doing 900 here)
If they scale aswell as i think then a 8900C would be a great choice for those who want great Gaming performance and more cores, with the added benefit of better power efficency and the others just aswell and in the case of the 8800 lineup it would be a middle ground between productivity cpu and gaming cpu with 3D V Cache, and so on.8600X TDP 65 W L3 32MB
8600 TDP 65 W L3 32 MB
8600C C-cores and 3D Vcache TDP 44 W L3 48 MB
8700 X TDP 105 W L3 32MB
8800X TDP 120 W L3 32MB
8700 / 8800 TDP 65 W L3 32MB
8700C TDP 75 W L3 48MB
8800C TDP 85 W L3 48MB
8800X3D TDP 120W L3 96MB
8900X TDP 170 W L3 64MB
8900 TDP 65 W L3 64MB
8900X3d TDP 120 W L3 96 MB ->if they make the same configuration as with the 7900X3d
8900C TDP 144 W L3 96 MB -> This one has on both CCDs 3D-V-Cache
provisioning for TSV (vertical wiring vias for connecting 3D V-Cache) has been eliminated, which saved space on the chip
it's like saying an amps value, but refusing to state volts or watts - some metrics are a combination of others, but singular ones are often worthless without the other corresponding data
from the link @AnotherReader posted above performance per watt and overall efficiency is what AMD makes their money from, and we're going to see consumer products reflecting that.
laptops and OEM desktops will love the tits off this, because it means smaller lighter products that need less cooling, and that means more profits.
Looks like they already are The further details on the current designs match something AMD said in TPU's interview recently, that they've got performance concerns passing a certain threshold without faster RAM to back it up - they've limited how many performance cores they can have with the current design (using 8 of 12 CCX links) This is where things will change, as the cores are individually slower they can slap in twice as many cores for an overall performance gain as well as an efficiency gain - and possibly use the unused CCX links.
256c cores in the server world is entirely plausible, and probably being worked on already.
Per CCX cache values, so combinations exist with dual CCX CPUs
16MB (Phoenix 'G' APU)
16MB (Dinoysus/Zen 4C)
32MB (per CCX) in regular Zen4 (Raphael)
96MB (x3D)
The Zen4C come across initially as being an APU without the APU, but they fit twice as many cores in the same space - mostly due to changing the SRAM used, it would seem. Using denser, slower L3 cache let them make it physically smaller and slap in double the cores, but since L1 and L2 are the same the basic performance matches Zen4 in general.
It's like a reversal of the x3D chips, since some tasks didnt benefit from the extra cache (rendering, extremely long workloads etc), they made a chip with less, slower cache to fit that need.
Cant wait for something with 8 3D cores and 32 C cores, that'll be the thing to blast every benchmark off the map
They used higher density cache (and less of it) which is something the OS doesnt know or care about, so all those core types appear the same.
The only thing needed is something the chipset driver already does, with a way to push games onto cores with higher cache if available Zen4C fits twice as many cores in the same space - they stick 16 cores where 8 Zen4 cores fit. They are a LOT denser.
Where Zen 4 and Zen 4c are used together is Phoenix 2, and in Phoenix 2 both types of cores share the same pool of L3 cache, so in Phoenix 2 there is literally no difference between the cores with respect to cache.
Summary: scheduling changes aren't needed at the OS level to use these CPUs. Hybrid designs need something to push programs to the best choice, but the 'worst case' wont be like on Intels E-cores where programs can outright crash or have massive performance losses.