Wednesday, June 7th 2023
AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs
A SemiAnalysis report sheds light on just how much smaller the "Zen 4c" CPU core is compared to the regular "Zen 4." AMD's upcoming high core-count enterprise processor for cloud data-center deployments, the EPYC "Bergamo," is based on the new "Zen 4c" microarchitecture. Although with the same ISA as "Zen 4," the "Zen 4c" is essentially a low-power, lite version of the core, with significantly higher performance/Watt. The core is physically smaller than a regular "Zen 4" core, which allows AMD to create CCDs (CPU core dies) with 16 cores, compared to the current "Zen 4" CCD with 8.
The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.
The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
Source:
SemiAnalysis
The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.
The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
34 Comments on AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs
Cache? Transistor layout designed to minimize space instead of increase clocks?
TLDR:
- reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
- a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
- lower clock speed target allows denser circuits
The source goes over this in considerable detail.It means amd could technically release a 32 core cpu for am5. But how fun thst would be and useful depends on the cores performance. If they suck in games and not really much faster than the 16 core cpu all ready out. It would not be a good solution.
For desktop, we have no leaks at the moment, but as the orignal article by Semianalysis shows, 16 c-core CCD is just a tad bigger than current 8 core CCD. So, 8+16 is very possible on current socket, and 16+16 is possible too, though this could require a new socket and new packaging for three CCDs and I/O dies.
L3 suffers slightly, but most loads will be pretty independent anyways, so that shouldn't matter much.
The main way it could be reduced is the access to the SRAM cells, which could have a slight cache latency penalty. I imagine it would require a redesign of the package, due to the difference in GMI layout. I would think it would be more likely in the next generation, once the design is tested.
[/HR]
I would be curious to see how performance of Zen 4 and 4c compare at the extremely low power levels available in a laptop. Especially single core performance at the same core cound and power budget. It might have a clock speed advantage sufficient to make up for the lack of L3, in a Dragon Range package, and more then enough in a Phoenix package.
But yeah it's there to create a 32 core / 64 thread consumer CPU.
The trouble is, the CCD is 16 core, so I am not sure if that can be made much cheaper as an 8 core.
Also, they wanted to create a versatile efficient core, but did not want a castrated, Atom-style core that Intel conceived for Alder Lake and Siera Forrest. Intel was so stubborn in this pursuit that they had to shut down AVX-512 instructions on client products. It was a price to pay for big-little choice and core inflation approach... Guess why Sapphire Rapids do not have e-cores? AVX-512 will have to come back in one form or another in client segment, but more importantly in cloud CPU with 144 e-cores. Will it work? We shall see next year.
The "riotous child", Dionysus core, is going to rock the boat. Dionysus is young, energetic and adventurous. The first iteration is 8 c-core CCX in 16 c-core CCD, but next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores. These decisions save a lot on design, manufacturing and packaging side, by using existing solutions with a few tweaks.
Bergamo will set a new pace in cloud server this year, but Turin dense will compete with Siera Forrest SKUs and next gen Grawiton, AmpereOne and Grace SKUs. It does look like AMD has got here the best of both worlds. They keep x86 core with full instructions, while deploying bespoke performance/watt efficiency against Atom and ARM cores.