Wednesday, June 7th 2023

AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

A SemiAnalysis report sheds light on just how much smaller the "Zen 4c" CPU core is compared to the regular "Zen 4." AMD's upcoming high core-count enterprise processor for cloud data-center deployments, the EPYC "Bergamo," is based on the new "Zen 4c" microarchitecture. Although with the same ISA as "Zen 4," the "Zen 4c" is essentially a low-power, lite version of the core, with significantly higher performance/Watt. The core is physically smaller than a regular "Zen 4" core, which allows AMD to create CCDs (CPU core dies) with 16 cores, compared to the current "Zen 4" CCD with 8.

The 16-core "Zen 4c" CCD is built on the same 5 nm EUV foundry node as the 8-core "Zen 4" CCD, and internally features two CCX (CPU core complex), each with 8 "Zen 4c" cores. Each of the two CCX shares a 16 MB L3 cache among the cores. The SemiAnalysis report states that the dedicated L2 cache size of the "Zen 4c" core remains at 1 MB, just like that of the regular "Zen 4." Perhaps the biggest finding is their die-size estimation, which puts the 16-core "Zen 4c" CCD just 9.6% larger in die-area, than the 8-core "Zen 4" CCD. That's 72.7 mm² per CCD, compared to 66.3 mm² of the regular 8-core "Zen 4" CCD.
The SemiAnalysis report states that the codename AMD assigned to the "Zen 4c" core itself, is "Dionysus," while the 16-core CCD is codenamed "Vindhya." The 128-core/256-thread "Begamo" EPYC 9754 processor is a chiplet-based multi-chip module, designed for existing Socket SP5 server infrastructure. The MCM features no more than eight "Zen 4c" CCDs to achieve its core-count of 128.

The Server I/O Die (sIOD) is built on the 6 nm process, and appears to be the same one found in EPYC "Genoa" processors. It features a 12-channel (24 sub-channel) DDR5 memory interface, and a PCI Express 5.0 x128 root-complex. The EYPC 9754 is a 400 W TDP-class processor, just like the top "Genoa" processor, but with much higher compute density. "Zen 4c" is shaping up to be AMD's answer to Intel's E-cores such as "Gracemont," the article notes.
Source: SemiAnalysis
Add your own comment

34 Comments on AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

#1
Denver
Depending on the performance I'm genuinely impressed.
Posted on Reply
#2
Bayonet
I wonder if we will see 24-core heterogeneous CPUs from AMD in the future.
Posted on Reply
#3
btarunr
Editor & Senior Moderator
BayonetI wonder if we will see 24-core heterogeneous CPUs from AMD in the future.
A socket AM5 chip with an 8-core + 3DV cache CCD and a 16-core Zen4c CCD? It's quite possible.
Posted on Reply
#4
Count von Schwalbe
What is the actual difference between Zen 4 and Zen 4c?

Cache? Transistor layout designed to minimize space instead of increase clocks?
Posted on Reply
#5
AnotherReader
I think you forgot to link to the source. While it's behind a paywall, the first part covering the physical design is free to read. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits
Count von SchwalbeWhat is the actual difference between Zen 4 and Zen 4c?

Cache? Transistor layout designed to minimize space instead of increase clocks?
The source goes over this in considerable detail.
Posted on Reply
#6
Tomgang
That's interesting. But depending on how fast/these cores are.

It means amd could technically release a 32 core cpu for am5. But how fun thst would be and useful depends on the cores performance. If they suck in games and not really much faster than the 16 core cpu all ready out. It would not be a good solution.
Posted on Reply
#7
AnotherReader
TomgangThat's interesting. But depending on how fast/these cores are.
This is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
Posted on Reply
#8
TheoneandonlyMrK
TomgangThat's interesting. But depending on how fast/these cores are.

It means amd could technically release a 32 core cpu for am5. But how fun thst would be and useful depends on the cores performance. If they suck in games and not really much faster than the 16 core cpu all ready out. It would not be a good solution.
I'm thinking they probably do know that, and if this is to copy Intel's big little, this would be the little wouldn't it, Soooo, they do probably have something in mind for the Big, Maybe what @btarunr suggested is possible in his personal post above.
Posted on Reply
#9
R0H1T
Almost certainly, if not with zen4 then maybe zen5 but whether they release it in this slagging market is anyone's guess. They don't really need it right now but if sales go down further they might just shut shop on AM4 & go with hybrid &/or xC chips to boost demand.
BayonetI wonder if we will see 24-core heterogeneous CPUs from AMD in the future.
Was addressed to both.
AnotherReaderThis is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
Posted on Reply
#10
Tek-Check
BayonetI wonder if we will see 24-core heterogeneous CPUs from AMD in the future.
We will probably first see smaller SKUs for mobility segment, as those have been tested in recent months and leaks came out.
For desktop, we have no leaks at the moment, but as the orignal article by Semianalysis shows, 16 c-core CCD is just a tad bigger than current 8 core CCD. So, 8+16 is very possible on current socket, and 16+16 is possible too, though this could require a new socket and new packaging for three CCDs and I/O dies.
Posted on Reply
#11
phanbuey
btarunrA socket AM5 chip with an 8-core + 3DV cache CCD and a 16-core Zen4c CCD? It's quite possible.
It will also not be very good unless they can figure out the scheduler.
Posted on Reply
#12
AnotherReader
phanbueyIt will also not be very good unless they can figure out the scheduler.
This would be simpler than the current cases. Just prefer the regular Zen 4 CCD until you run out of cores. Of course, if it uses the stacked cache, then that would probably complicate things.
Posted on Reply
#13
Tek-Check
btarunrA socket AM5 chip with an 8-core + 3DV cache CCD and a 16-core Zen4c CCD? It's quite possible.
I doubt we would see the first heterogenous SKU 8+16 with V cache, but it's all in play for this to happen soon.
Count von SchwalbeWhat is the actual difference between Zen 4 and Zen 4c?
Cache? Transistor layout designed to minimize space instead of increase clocks?
Did you read the article and the link to original article? It's all there.
TomgangIt means amd could technically release a 32 core cpu for am5. But how fun thst would be and useful depends on the cores performance. If they suck in games and not really much faster than the 16 core cpu all ready out. It would not be a good solution.
Gaming is irrelevant. You already have up to 16 big cores for gaming. Plenty! Additional CCD with c-cores would primarily boost MT workloads.
AnotherReaderThis is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
C-cores will be more performant than e-cores because c-cores support HT and all instrucitons, including AVX512, whereas e-cores have a single thread and reduced instructions.
Posted on Reply
#15
AnotherReader
Tek-CheckC-cores will be more performant than e-cores because c-cores support HT and all instrucitons, including AVX512, whereas e-cores have a single thread and reduced instructions.
I concur; these have none of the drawbacks of the current E cores. They have the same IPC and ISA as Zen 4; the tradeoff is lower clock speeds for lower die rea and presumably lower power.
Posted on Reply
#16
R0H1T
Should be slightly lower IPC than current chips with bigger caches, depending on the workload as well.
Posted on Reply
#17
AnotherReader
R0H1TShould be slightly lower IPC than current chips with bigger caches, depending on the workload as well.
If a workload is too large for the 16 MiB L3, then it would have lower IPC than regular Zen 4.
Posted on Reply
#18
Count von Schwalbe
R0H1TShould be slightly lower IPC than current chips with bigger caches, depending on the workload as well.
Not really in the target market. Independent cache is the same amount.

L3 suffers slightly, but most loads will be pretty independent anyways, so that shouldn't matter much.

The main way it could be reduced is the access to the SRAM cells, which could have a slight cache latency penalty.
btarunrA socket AM5 chip with an 8-core + 3DV cache CCD and a 16-core Zen4c CCD? It's quite possible.
I imagine it would require a redesign of the package, due to the difference in GMI layout. I would think it would be more likely in the next generation, once the design is tested.
[/HR]
I would be curious to see how performance of Zen 4 and 4c compare at the extremely low power levels available in a laptop. Especially single core performance at the same core cound and power budget. It might have a clock speed advantage sufficient to make up for the lack of L3, in a Dragon Range package, and more then enough in a Phoenix package.
Posted on Reply
#19
Jism
TomgangThat's interesting. But depending on how fast/these cores are.

It means amd could technically release a 32 core cpu for am5. But how fun thst would be and useful depends on the cores performance. If they suck in games and not really much faster than the 16 core cpu all ready out. It would not be a good solution.
I'd prefer a single CCD over a dual CCD all day and night.

But yeah it's there to create a 32 core / 64 thread consumer CPU.
Posted on Reply
#20
kondamin
would be nice if they made a tiny 8 core ccd for a cheap low power chip
Posted on Reply
#21
Count von Schwalbe
kondaminwould be nice if they made a tiny 8 core ccd for a cheap low power chip
Sounds like they are doing so, in the mobile market.

The trouble is, the CCD is 16 core, so I am not sure if that can be made much cheaper as an 8 core.
Posted on Reply
#22
AnotherReader
I just realized that the code names for the Zen 4 and Zen 4c cores are emblematic of their origins. Persephone is the mother of Dionysus in some sources.
Posted on Reply
#23
Tek-Check
AnotherReaderI concur; these have none of the drawbacks of the current E cores. They have the same IPC and ISA as Zen 4; the tradeoff is lower clock speeds for lower die rea and presumably lower power.
True that. The tradeoff is necessary to compete with ARM designs for cloud server. I would not even call it a "tradeoff". It's a different type of core, like those diverse cores on ARM SoCs. c-cores are much better in performance/watt, which is a fundamental feature in increasing number of systems in variety of segments.
AnotherReaderI just realized that the code names for the Zen 4 and Zen 4c cores are emblematic of their origins. Persephone is the mother of Dionysus in some sources.
And riotous son of Zeus :)
Posted on Reply
#25
Tek-Check
AnotherReaderThat would make ARM's Neoverse V1 the father of Zen 4c o_O
Agreed. It's a spiritual father of Bergamo. AMD found out a few years ago what Amazon was planning to develop with Graviton CPUs on Neoverse platforms, including highest performing Zeus. They realised that the only way to stay competitive in hyperscalers segment, while staying on x86, was to develop a new efficiency core with rebalanced performance, power consumption, size and lower cost.

Also, they wanted to create a versatile efficient core, but did not want a castrated, Atom-style core that Intel conceived for Alder Lake and Siera Forrest. Intel was so stubborn in this pursuit that they had to shut down AVX-512 instructions on client products. It was a price to pay for big-little choice and core inflation approach... Guess why Sapphire Rapids do not have e-cores? AVX-512 will have to come back in one form or another in client segment, but more importantly in cloud CPU with 144 e-cores. Will it work? We shall see next year.

The "riotous child", Dionysus core, is going to rock the boat. Dionysus is young, energetic and adventurous. The first iteration is 8 c-core CCX in 16 c-core CCD, but next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores. These decisions save a lot on design, manufacturing and packaging side, by using existing solutions with a few tweaks.

Bergamo will set a new pace in cloud server this year, but Turin dense will compete with Siera Forrest SKUs and next gen Grawiton, AmpereOne and Grace SKUs. It does look like AMD has got here the best of both worlds. They keep x86 core with full instructions, while deploying bespoke performance/watt efficiency against Atom and ARM cores.
Posted on Reply
Add your own comment
Nov 21st, 2024 09:02 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts