Wednesday, June 7th 2023

AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

A SemiAnalysis report sheds light on just how much smaller the "Zen 4c" CPU core is compared to the regular "Zen 4." AMD's upcoming high core-count enterprise processor for cloud data-center deployments, the EPYC "Bergamo," is based on the new "Zen 4c" microarchitecture. Although with the same ISA as "Zen 4," the "Zen 4c" is essentially a low-power, lite version of the core, with significantly higher performance/Watt. The core is physically smaller than a regular "Zen 4" core, which allows AMD to create CCDs (CPU core dies) with 16 cores, compared to the current "Zen 4" CCD with 8.

The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.
The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.

The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
Source: SemiAnalysis
Add your own comment

34 Comments on AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

#26
Aquinus
Resident Wat-man
You know, datacenter use cases is what's driving all of this. If you take cloud computing for example, you have a lot of different things going on. The goal is to maximize throughput and increasing the number of cores at the cost of some features that consume a lot of die space makes sense. If I consider the product that I oversee the engineering efforts for and how this relates to it, it's basically being able to squeeze more service VMs (think AWS ECS Fargate,) in a smaller area for tasks that most definitely don't need things like AVX-512. Most of the stuff the application does (being a SaaS product,) is integer workload heavy, so something like this would make a whole lot of sense if they start cutting things like vector extensions. The nice thing is that you could run it on either and just let the JVM intrinsics do its magic, but at the end of the day the business cares about 2 things, customer retention and cost of doing business.

With that said, I see a single CCD option being a really nice entry/budget option for servers. There is a whole lot to like here if you're working on software that'll be running on a server, but I honestly don't see AMD doing the hybrid thing. I could be wrong, but this looks like another move to placate to the server market, not to your ordinary consumer.
Posted on Reply
#27
Wirko
Tek-Checknext year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores.
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
Posted on Reply
#28
Minus Infinity
AnotherReaderThis is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
Posted on Reply
#29
Tek-Check
WirkoMemory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
AMD is the least worried about memory, considering they are the only vendor currently with 12 channels for cloud workloads and what other vendors would offer next year.
Intel - Sierra Forest could offer 768-bit bus (12 channels x64-bit), but on 144 e-cores
Apple - M2 Ultra offers 1024-bit bus (8 channels x128-bit), still below 1TB/s
ARM - Indian chip C-DAC AUM could offer up to 512-bit bus (16 channels x32-bit)
RISC-V - Tenstorrent CPU will max out at roughly 256-bit bus (8 channels x32-bit)

Turin dense could offer V cache too on -X SKUs, just like Genoa-X does, which brings above 1.1TB/s throughput
Plus, it will support CXL memory expanders, so customers could widen memory throughput as they please on 64 PCIe 5.0 lanes
Posted on Reply
#30
david salsero
AnotherReaderI think you forgot to link to the source. While it's behind a paywall, the first part covering the physical design is free to read. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits



The source goes over this in considerable detail.
Great job by AMD. Now it's time to do it with the Ryzen laptops, I don't understand the OEMs having the best quality/price processor, the Zen 4 Phoenix does not come out more laptops
Posted on Reply
#31
AnotherReader
Minus InfinityBut 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
Posted on Reply
#32
TechLurker
Idly, I wonder if AMD will also bring SM4 to the enterprise sector (turning 1 16c into 64t monstrosity), or maybe capitalize more on their Xilinx division for some serious FPGA circuitry that can shift roles on the fly as-needed when-needed.
Posted on Reply
#33
Minus Infinity
AnotherReaderYou're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
Posted on Reply
#34
AnotherReader
Minus InfinityI was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
Scheduling a thread onto Zen4c instead of Zen 4 won't impact performance as much as scheduling a thread onto an E core instead of a P core in Intel's SOCs. However, Zen 4c will be slower than Zen 4 due to the lower clock speed; therefore, the OS will still prioritize Zen 4.
Posted on Reply
Add your own comment
Jun 3rd, 2024 11:27 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts