Wednesday, June 14th 2023

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.

First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.
The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).

What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.
The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
Add your own comment

153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

#151
Mussels
Freshwater Moderator
AnotherReaderIt's the cores that are much denser. The L3 seems to be the same.
The article linked mentioned they used denser SRAM? as part of the cache. I'd have to re-read again
Posted on Reply
#152
Squared
MusselsThe article linked mentioned they used denser SRAM? as part of the cache. I'd have to re-read again
SRAM is what processor cache is usually made of. The alternative is DRAM which is much slower and I can only remember it being used for L4 cache in Broadwell.

As I recall, Bergamo, the 128-core server die, has denser SRAM cells making up its L3 cache. But I don't recall Phoenix 2—the 2 Zen4 + 4 Zen4c mobile die—having denser cache. My guess is it uses the regular-density L3 cache that other Zen 4 products use.
Posted on Reply
#153
AnotherReader
MusselsThe article linked mentioned they used denser SRAM? as part of the cache. I'd have to re-read again
I wasn't clear; doing the maths reveals that both the L2 and L3 are as dense as regular Zen 4. Some other SRAMs in the core use denser cells.
To save area, AMD has replaced these 8T dual-port bitcells with a new 6T pseudo dual-port bitcell developed by TSMC.
Posted on Reply
Add your own comment
Nov 21st, 2024 13:09 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts