Wednesday, June 14th 2023

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.

First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.
The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).

What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.
The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.
Add your own comment

153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

#101
A Computer Guy
KellyNyanbinary“c” for “cloud”. Gotta get the buzzwords in.
Oh geez. I hope they are not moving to a core subscription model. Renew online now before your cores are deactivated!
Posted on Reply
#102
dragontamer5788
AquinusI'm pretty sure that I know what "c" stands for.
Your meme is good enough for me.
Posted on Reply
#103
Dr. Dro
AusWolfThat will depend on the application, I think. In non-gaming workloads, I expect this to be just as good as regular Zen 4.
It will. This isn't entirely a new concept, with Intel Wolfdale processors the only difference between the Celeron, Pentium and Core 2 Duo-branded parts was the amount of L2 cache enabled. The Celeron parts had 1 MB, Pentium 2 MB, Core 2 Duo E7000-series 3 MB and the Core 2 Duo E8000-series had 6 MB. On these, the cache size dictated the chip's performance to the absolute extreme, but it also meant that in workloads which were entirely insensitive to cache, the Celeron would perform practically just as great as an E8000-class processor. As an additional segmentation measure, Intel disabled SSE4.1 support on the lower range (non-Core 2 brand) CPUs, but I have a distinct memory of running PCSX2 quite well on the Celeron E3200 in SSSE3 mode back then.

Of course, the emulator was completely different to the one we have today, so you won't have the same success replicating that with a modern build of PCSX2. I suppose we shouldn't expect to see Zen 4c adopted on desktop, but you may see it in low-power, cost-efficient mobile SoCs eventually (such as a Mendocino/Zen 2 ULV replacement).
Posted on Reply
#104
Oberon
R-T-BCache is actually pretty energy heavy and can be one of the hotter parts of the chip.
Do you have any examples where this is the case? Here's my 5900X running small FFTs on all cores:

Despite taking approximate equal amounts of die space, the L3 temps are approximately 30% lower (indicating proportionally lower power consumption.) There are probably benchmarks of chips with v-cache out there against chips without it running at the same clocks and core voltages that would show minimal differences between the two, but I'm too lazy to look. We could also just look at the stock performance results of the 5800X3D and see that, despite having approximately 30% more silicon due to the slice of v-cache, it actually consumes considerably less power than the 5800X while only running at a 100-200 MHz deficit.
Posted on Reply
#105
R-T-B
OberonDo you have any examples where this is the case?
Not in Ryzen. Most of my examples are antiquated. I was corrected above.
Posted on Reply
#106
Mussels
Freshwater Moderator
R0H1TShould've gone with zen 4l for lite or something, why the heck did they think 4c was better?
c is for compact
"lite" implies less performance - these are designed to be 100% the same performance, in memory intensive tasks where the cache doesnt help



It's very clear their goal is to have balanced CPU's, but then CPU's with a mix of 3D and C cores - 8 3D cores will provide top-tier gaming performance, while they can suddenly fit more C-cores in the same space (which makes them cheaper to produce) and have an 83D+12C setup out fairly easily with their chiplet designs
OberonDo you have any examples where this is the case? Here's my 5900X running small FFTs on all cores:

Despite taking approximate equal amounts of die space, the L3 temps are approximately 30% lower (indicating proportionally lower power consumption.) There are probably benchmarks of chips with v-cache out there against chips without it running at the same clocks and core voltages that would show minimal differences between the two, but I'm too lazy to look. We could also just look at the stock performance results of the 5800X3D and see that, despite having approximately 30% more silicon due to the slice of v-cache, it actually consumes considerably less power than the 5800X while only running at a 100-200 MHz deficit.
The 5800x3D is the example, where the cache runs hotter and they have to be clocked down
Remember that by the time us home users run a test, we're already running them optimised - we cant run a 5800x3D at 5.05GHz and compare to a boosted 5800x

They managed to find a way to force windows to allow them to match intel with Async CPU designs with their drivers, so now they can get fancy and have 3D, normal and C cores and mix and match a dozen products from 3 parts
Posted on Reply
#107
Oberon
MusselsThe 5800x3D is the example, where the cache runs hotter and they have to be clocked down
Remember that by the time us home users run a test, we're already running them optimised - we cant run a 5800x3D at 5.05GHz and compare to a boosted 5800x
The 5800X3D runs comparably hotter because they're literally insulating the underlying core with a whole other slice of heat-generating silicon on top of it. You can also independently monitor the cache temps on those chips and see that it still runs cooler than the cores.
Posted on Reply
#108
Mussels
Freshwater Moderator
OberonThe 5800X3D runs comparably hotter because they're literally insulating the underlying core with a whole other slice of heat-generating silicon on top of it. You can also independently monitor the cache temps on those chips and see that it still runs cooler than the cores.
Just because its cooler than the cores, doesn't mean it's capable of running as hot as the cores without dying
Posted on Reply
#109
TechnoLadz
Od1sseasIntel can pack 4 E-Cores in the same size as 1 P-Core. What about AMD? How many Zen4c cores for one Zen 4 core?
If the E core is 1/4 the size of Intel's P Core, then if we use the presentation from AMD on August 29th 2022, you can see that Golden Cove is 7.46mm^2. Raptor Cove is a refresh of Golden Cove, so let's presume the same size. 7.46/4 =~ 1.87mm^2.
Posted on Reply
#110
Oberon
MusselsJust because its cooler than the cores, doesn't mean it's capable of running as hot as the cores without dying
Cool story, but 1) the limitation on the v-cache parts is a voltage issue rather than a temperature issue and 2) that has almost nothing to do with the point we were discussing.
Posted on Reply
#111
Mussels
Freshwater Moderator
OberonCool story, but 1) the limitation on the v-cache parts is a voltage issue rather than a temperature issue and 2) that has almost nothing to do with the point we were discussing.
I can't help it that people are talking about different things
Voltage does nothing, it's amps that's the problem - and the problem is the heat from high amps kills the 3Dcache, at lower temps than the CPU's can safely run at
Posted on Reply
#112
AusWolf
MusselsI can't help it that people are talking about different things
Voltage does nothing, it's amps that's the problem - and the problem is the heat from high amps kills the 3Dcache, at lower temps than the CPU's can safely run at
That should be fixed in recent BIOSes... "Should"...
Posted on Reply
#113
Oberon
MusselsI can't help it that people are talking about different things
Voltage does nothing, it's amps that's the problem - and the problem is the heat from high amps kills the 3Dcache, at lower temps than the CPU's can safely run at
[citation needed]
Posted on Reply
#114
dragontamer5788
x3d cache is SRAM, right? Which means its RAM created with NAND-gates, very similar to logic (aka: the kind of silicon used in the core for add/subtract/multiply circuits).

That would mean that I expect it to have similar heat/power/thermal constraints as any other logic-chip. Because SRAM IS logic.
Posted on Reply
#115
AnotherReader
dragontamer5788x3d cache is SRAM, right? Which means its RAM created with NAND-gates, very similar to logic (aka: the kind of silicon used in the core for add/subtract/multiply circuits).

That would mean that I expect it to have similar heat/power/thermal constraints as any other logic-chip. Because SRAM IS logic.
But unlike logic, SRAM is only activated a few gates at a time. So power consumption is much lower than you would think. Moreover, in modern chips, getting the data is the most power intensive part. The constraints of 20 years ago aren't relevant. Execution units are cheap; wires are expensive.

Posted on Reply
#116
Mussels
Freshwater Moderator
Oberon[citation needed]
google is your friend, i'm not here to hold your hand and provide links for every single thing you've never heard of
Posted on Reply
#118
InVasMani
Musselsgoogle is your friend, i'm not here to hold your hand and provide links for every single thing you've never heard of

source?
Posted on Reply
#120
Mussels
Freshwater Moderator
R0H1TWell AMD's just released their first quad core desktop chip in 4(3?) years :cool:

AMD Quietly Introduces Ryzen 3 5100 Quad-Core Processor For AM4
Brain fart
AM4!

I mean, i said new CPU's were coming to AM4 a while back but this isn't what i had in mind.

My guess is AM4 is their budget platform now - they want to keep selling A520 and B550 boards (and the chipsets) with some AM4 CPUs to the low end market, while AM5 matures and gets cheaper over time

It makes sense for them to sell one generation old + the new at the same time, so they can keep two production lines running at any given time.
That way their budget stuff isn't fighting for fab space of the high end parts
Posted on Reply
#121
A Computer Guy
MusselsBrain fart
AM4!

I mean, i said new CPU's were coming to AM4 a while back but this isn't what i had in mind.

My guess is AM4 is their budget platform now - they want to keep selling A520 and B550 boards (and the chipsets) with some AM4 CPUs to the low end market, while AM5 matures and gets cheaper over time
Based on the trend I've seen with the prices of 1000, 2000, 3000 chips I suspect once 8000 series is released the 5000 series chips won't be as cost effective anymore and the 7000 series will move into that spot for better price for performance dollar.
Posted on Reply
#123
Mussels
Freshwater Moderator
A Computer GuyBased on the trend I've seen with the prices of 1000, 2000, 3000 chips I suspect once 8000 series is released the 5000 series chips won't be as cost effective anymore and the 7000 series will move into that spot for better price for performance dollar.
Once they shrink to a smaller node for the performance parts, they can keep using the larger process and taper off the oldest one
it spreads the risk out, and lets them get more products to market against all those seemingly random launch shortages we've suffered


That video explains what's been niggling in my mind with intels un-Efficiency cores: The AMD cores are using 1.5-2W each on those 96-128 core chips. Those arent even the new C cores.

~100W to 6 3D gaming P cores and 35W over 16 C cores?
Yeah, that would work wonders. Imagine office PC's and laptops at that power level, if they arent boosting them out of their efficiency curves.
Posted on Reply
#124
londiste
MusselsThat video explains what's been niggling in my mind with intels un-Efficiency cores: The AMD cores are using 1.5-2W each on those 96-128 core chips. Those arent even the new C cores.

~100W to 6 3D gaming P cores and 35W over 16 C cores?
Yeah, that would work wonders. Imagine office PC's and laptops at that power level, if they arent boosting them out of their efficiency curves.
Epycs are running this stuff at 2-3GHz. Limiting a desktop (or laptop) CPU to that has a pretty profound effect on any load that is single-core or depends on few cores. Games are the obvious practical example - some will work just fine with a minor performance hit but in general you'd take a sizeable one. Benchmark results for anything like that - maybe Cinebench - would also be quite devastating.

It is a bit of neverending conundrum with chips - desired optimization points. As a manufacturer, do you want or can you sell efficiency as the main point? CPUs are a little trickier at that but GPUs might be an easier example - would you want an RTX 4090 at 300W power limit? How about 150W? Given that everything that would go into such product remains the same, meaning the cost would also be the same.

There is always possibility of limiting the larger CPU (or GPU) to the desired spot. AMD even has ECO mode. Both AMD and Intel (and Nvidia) have configurable power limits and depending on specific thing and needs also frequency limits. Basically, take a 7800X3D, limit its frequency to 3GHz and set the power limit at 24W and see where it leaves you and whether you would be willing to pay the cost for the results you get. Would be an interesting test, to be honest.
Posted on Reply
#125
Mussels
Freshwater Moderator
londisteEpycs are running this stuff at 2-3GHz. Limiting a desktop (or laptop) CPU to that has a pretty profound effect on any load that is single-core or depends on few cores. Games are the obvious practical example - some will work just fine with a minor performance hit but in general you'd take a sizeable one. Benchmark results for anything like that - maybe Cinebench - would also be quite devastating.

It is a bit of neverending conundrum with chips - desired optimization points. As a manufacturer, do you want or can you sell efficiency as the main point? CPUs are a little trickier at that but GPUs might be an easier example - would you want an RTX 4090 at 300W power limit? How about 150W? Given that everything that would go into such product remains the same, meaning the cost would also be the same.

There is always possibility of limiting the larger CPU (or GPU) to the desired spot. AMD even has ECO mode. Both AMD and Intel (and Nvidia) have configurable power limits and depending on specific thing and needs also frequency limits. Basically, take a 7800X3D, limit its frequency to 3GHz and set the power limit at 24W and see where it leaves you and whether you would be willing to pay the cost for the results you get. Would be an interesting test, to be honest.
C-cores are perfectly fine at 3GHz since they're just there for multithreaded performance, they dont need to boost up.
power efficiency is the key here - 16 core 32 thread laptops in a 45W power limit is entirely plausible, and that would shake up the market a lot


These are different in that even power limited and optimised they're well under the wattage you can achieve on anything else - played with a zen3+ DDR5 laptop and it was 3-4W per core in MT and 6W peak ST, 2-3x higher than these epyc cores, which again are not based on the C core design.

Dont you think the limited release 5600x3D seems like the perfect thing to pair up with a bunch of C-cores? Memory light tasks get the 3D cache, core/thread heavy tasks get the C-cores.


Big big deal here is that the C-cores are also physically smaller, they can get more in the same physical space and produce more per wafer. That helps out the bottom line a lot.
Posted on Reply
Add your own comment
May 21st, 2024 01:05 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts