Wednesday, June 14th 2023
AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).
What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC
Since Intel 13th Gen has a significantly better memory controller, pair that 13600K with fast ram (7200Mhz) a few tweaks in the timings and it's gonna easily beat the 7700X in games as well.
And ALL of that while being on an INFERIOR node (10nm)
"But is muh is skylake and not actually efficient!!!"
But, this article is about phenomenal Zen4 c-cores. Dude, why are you spamming this thread with irrelevant stuff?
The article is about new Zen4 c-core for cloud computing.
Focus, for God's sake! Learn something new.
3) I'm not an Intel fanboy, actually I'm not a fanboy at all. My previous CPU was an R5 2600. I just buy whatever is better for the money and in this case the 13600K was the obvious choice. When I bought it (300$) the 7700X was 400$. I got a better CPU for less money.
Yeah nothing to do with being a fanboy, the fact is Intel's in this position in large part due to their own effin greed. They completely deserve what comes their way IMO.
It is not weakness. They created another two server architectures, one oriented on core density (blue) a another one on cache per core for specific server usage (orange). It is addon to existing universal server CPU architecture (grey).
"But this is not a problem on CPUs intended for servers..."
Not only for servers.
AMD’s Chief Technical Officer had this to say at their Ryzen 7000 Keynote: AMD has taken the same Zen 4 architecture and pulled several "tricks" (already in short described in this forum or in details on SemiAnalysis) in physical design to save a huge amount of area. This means an identical IPC and ISA feature level, which simplifies integration also on the client side. In fact, AMD’s is also silently swapping some Zen 4 cores with Zen 4c cores in its lower-end 4nm Ryzen 7000U “Phoenix” mobile processors.
Right now we have a benchmarks war, notice how close intel and AMD are on the benchmarks? I dont think thats by coincidence.
They ship at the part of the v/f curve to hit the bench performance they want.
The 4c cores are simplified in ways that will prevent them from reaching high frequencies. That's by design and that's fine.
Where did I say that APU-grade connoted shittier quality, or that Phoenix is using 4c cores?
@R-T-B just going off of observable Ryzen behaviour in the past 3 generations and its internal temp monitoring, L3 doesn't usually get hot or draw a lot of power. Whether in 16MB, 32MB or Vcache form.
David Kanter covered that topic (and related stuff) in one of his articles here:
www.realworldtech.com/transistor-count-flawed-metric/
AMD Zen4 is roughly half the size of Intels P core and double the size of Intels E core, which is why going with a big/little layout for AMD just doesn't make that much sense (except maybe with laptops). Zen4C is something that was likely in the works for maybe 3-5 years in development.
I am pretty surprised at how much space they managed to save by reducing cache though.