Wednesday, June 7th 2023
AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs
A SemiAnalysis report sheds light on just how much smaller the "Zen 4c" CPU core is compared to the regular "Zen 4." AMD's upcoming high core-count enterprise processor for cloud data-center deployments, the EPYC "Bergamo," is based on the new "Zen 4c" microarchitecture. Although with the same ISA as "Zen 4," the "Zen 4c" is essentially a low-power, lite version of the core, with significantly higher performance/Watt. The core is physically smaller than a regular "Zen 4" core, which allows AMD to create CCDs (CPU core dies) with 16 cores, compared to the current "Zen 4" CCD with 8.
The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.
The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
Source:
SemiAnalysis
The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.
The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
34 Comments on AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs
With that said, I see a single CCD option being a really nice entry/budget option for servers. There is a whole lot to like here if you're working on software that'll be running on a server, but I honestly don't see AMD doing the hybrid thing. I could be wrong, but this looks like another move to placate to the server market, not to your ordinary consumer.
Intel - Sierra Forest could offer 768-bit bus (12 channels x64-bit), but on 144 e-cores
Apple - M2 Ultra offers 1024-bit bus (8 channels x128-bit), still below 1TB/s
ARM - Indian chip C-DAC AUM could offer up to 512-bit bus (16 channels x32-bit)
RISC-V - Tenstorrent CPU will max out at roughly 256-bit bus (8 channels x32-bit)
Turin dense could offer V cache too on -X SKUs, just like Genoa-X does, which brings above 1.1TB/s throughput
Plus, it will support CXL memory expanders, so customers could widen memory throughput as they please on 64 PCIe 5.0 lanes