Wednesday, June 14th 2023

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

Jun 14th, 2023 05:14 Discuss (153 Comments)

AMD on Tuesday (June 13) launched the EPYC 9004 "Bergamo" 128-core/256-thread high density compute server processor, and with it, debuted the new "Zen 4c" CPU microarchitecture. A lot had been made out about Zen 4c in the run up to yesterday's launch, such as rumors that it is a Zen 4 "lite" core that has lesser number-crunching muscle, and hence lower IPC, and that Zen 4c is AMD's answer to Intel's E-core architectures, such as "Gracemont" and "Crestmont." It turns out that it's neither a lite version of Zen 4, nor is it an E-core, but a physically compacted version of the Zen 4 core, with identical number crunching machinery.

First things first—Zen 4c has the same exact IPC as Zen 4 (that's performance at a given clock-speed). This is because its front-end, execution stage, load/store component, and internal cache hierarchy is exactly the same. It has the same 88-deep load queue, 64-deep store queue, the same 675,000 µop cache, the exact same INT+FP issue width of 10+6, the same exact INT register file, the same scheduler, and cache latencies. The L1I and L1D caches are the same 32 KB in size as "Zen 4," and so is the dedicated L2 cache, at 1 MB.

The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD. While the regular 8-core "Zen 4" CCD has eight "Zen 4" cores sharing a 32 MB L3 cache, the new 16-core "Zen 4c" CCD AMD introduced with "Bergamo" sees the chiplet pack two 8-core CCX (CPU core complexes), each with 16 MB of L3 cache shared among the 8 cores of the CCX. In this respect, the last-level cache and CPU core organization of the "Zen 4c" CCD has some similarities to the "Zen 2" CCD (which used two 4-core CCXs).

What's interesting is that the 16-core "Zen 4c" CCD isn't AMD's first product from this generation with lower last-level cache per core. The "Phoenix" APU silicon used in Ryzen 7040 series mobile processors sees eight "Zen 4" cores share a 16 MB L3 cache. For math-heavy compute workloads with lesser memory footprint, "Zen 4c" offers identical performance to "Zen 4," however, the smaller L3 cache should impact performance in bandwidth-sensitive workloads with large data-sets.

The Zen 4c CCD is built on the same exact TSMC 5 nm EUV foundry node that the company makes its regular 8-core Zen 4 CCD on, however, the Zen 4c CPU core is 35% smaller than the Zen 4 core, with a die area (per-core) of just 2.48 mm², compared to 3.84 mm². The die-size savings probably come from AMD "compacting" the various core components without reducing their form or function in any way. As we said earlier, the counts of the various core components remains the same, as do the sizes of the µ-op, L1, and L2 caches. EPYC 9004 "Bergamo" achieves its core-count of 128 using eight of these 16-core Zen 4c CCDs. In comparison, the regular "Genoa" processor achieves 96 cores over twelve 8-core Zen 4 CCDs.

Add your own comment

153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

#51

Tek-Check

DavenMaybe 4c stands for Zen 4 ‘compact’. Siena might introduce Zen 4e as an E-core like architecture. AFAIK, Intel’s E-core removes basic functional blocks to achieve a smaller footprint. Zen 4c does not remove any basic or even specialized (bfloat, VINNI, AVX512) functional blocks.

Siena has the same cores as Bergamo, just up to 64, for telecom companies to manage large switch, network and node deployments.

#52

evernessince

Cippo95I think that in the future AMD will combine this idea with 3d vertical cache: compact design with normal L1, L2 and little L3 + big L3 on a vertical layer.

Logically that makes sense, if you have future CPU chiplets with a large cluster of cores the most efficient way for them to access a cache would be via stacked cache. Placing the cache right above the cores makes the trip a lot smaller than if you had a cache outside the cores.

#53

Od1sseas

NeuralNexusIt's kind of funny because people have brought into the marketing fluff when it comes to their desktop product stack. Efficiency cores are just as bloated as the Performance cores. Because they are using skylake architecture for those cores.

Who gives a f* on how actually efficient they are or if they are just Skylake cores? What matters is actual performance and Intel delivers. The 13600K destroys the 7600X and matches/slightly beats the 7700X in everything non-gaming.

Since Intel 13th Gen has a significantly better memory controller, pair that 13600K with fast ram (7200Mhz) a few tweaks in the timings and it's gonna easily beat the 7700X in games as well.

And ALL of that while being on an INFERIOR node (10nm)

"But is muh is skylake and not actually efficient!!!"

#54

R0H1T

Just because it runs/supports higher speed memory doesn't make its IMC better, in fact quite the opposite since AMD delivers similar or better performance with slower RAM.

Od1sseasAnd ALL of that while being on an INFERIOR node (10nm)

I wonder if you remember everyone criticizing AMD back in the dozer days when they were anywhere between 1-2(1.5?) nodes behind?

Od1sseas"But is muh is skylake and not actually efficient!!!"

You probably don't remember because you were enjoying quad cores for 10 years at the same price! Another fun fact Intel didn't reduce their MSRP's probably for a decade till Zen showed up :rolleyes:

#55

Tek-Check

ValenOneAfter 10 minute run, Intel Core i9 13900KS's scores are lower.

We know this from Hardware Unboxed. i9 thermaly throttles in benchmarks after short time and loses up to 8-9% of performance.
But, this article is about phenomenal Zen4 c-cores.

Od1sseasWho gives a f* on how actually efficient they are or if they are just Skylake cores? What matters is actual performance and Intel delivers. The 13600K destroys the 7600X and matches/slightly beats the 7700X in everything non-gaming.

Dude, why are you spamming this thread with irrelevant stuff?
The article is about new Zen4 c-core for cloud computing.
Focus, for God's sake! Learn something new.

#56

Od1sseas

R0H1TJust because it runs/supports higher speed memory doesn't make its IMC better, in fact quite the opposite since AMD delivers similar or better performance with slower RAM.

I wonder if you remember everyone criticizing AMD back in the dozer days when they were anywhere between 1-2(1.5?) nodes behind?

You probably don't because you were enjoying quad cores for 10 years at the same price! Another fun fact Intel didn't reduce their MSRP's probably for a decade till Zen showed up :rolleyes:

1) Because gaming performance is comparable it doesn't mean their IMC is good lol. My guess will be because of the much higher amount of cache. The higher the amount of cache, the less dependant on RAM. Intel's IMC is miles better. 13th gen can do 7200Mhz XMP and if you have a high end board you can even hit 8000Mhz++

3) I'm not an Intel fanboy, actually I'm not a fanboy at all. My previous CPU was an R5 2600. I just buy whatever is better for the money and in this case the 13600K was the obvious choice. When I bought it (300$) the 7700X was 400$. I got a better CPU for less money.

#57

R0H1T

Od1sseasBecause gaming performance is comparable it doesn't mean their IMC is good lol.

Didn't say anything just constricted to gaming ~

Od1sseasI'm not an Intel fanboy, actually I'm not a fanboy at all.

Yeah nothing to do with being a fanboy, the fact is Intel's in this position in large part due to their own effin greed. They completely deserve what comes their way IMO.

#58

rbgc

DenverIt seems to me that the simplification of the design has the weakness of not reaching clocks as high as Zen4. But this is not a problem on CPUs intended for servers...

"The only thing that's changed is that the effective L3 cache per core has been reduced to 2 MB, from 4 MB on the 8-core "Zen 4" CCD."

If that's true why the slide says 35% smaller comparing just core+L2 ???

"It seems to me that the simplification of the design has the weakness of not reaching clocks as high as Zen4."

It is not weakness. They created another two server architectures, one oriented on core density (blue) a another one on cache per core for specific server usage (orange). It is addon to existing universal server CPU architecture (grey).

"But this is not a problem on CPUs intended for servers..."

Not only for servers.

AMD’s Chief Technical Officer had this to say at their Ryzen 7000 Keynote:

Our Zen 4c, it's our compact density that's an addition, it's a new swimlane to our cores roadmap, and it delivers the identical functionality of Zen 4 at about half of the core area.

AMD has taken the same Zen 4 architecture and pulled several "tricks" (already in short described in this forum or in details on SemiAnalysis) in physical design to save a huge amount of area. This means an identical IPC and ISA feature level, which simplifies integration also on the client side. In fact, AMD’s is also silently swapping some Zen 4 cores with Zen 4c cores in its lower-end 4nm Ryzen 7000U “Phoenix” mobile processors.

#59

R0H1T

What do you mean swapping silently? It'd have to be an entirely new chip.

#60

evolucion8

NeuralNexusIt's kind of funny because people have brought into the marketing fluff when it comes to their desktop product stack. Efficiency cores are just as bloated as the Performance cores. Because they are using skylake architecture for those cores.

E cores are derivatives from Goldmont and Sunnycove cores used on Atoms, their IPC matches Skylake but aren't based on Skylake at all.

#61

AusWolf

chrcolukWith my experimenting, (albeit on windows 10 which doesnt have intels pre configured scheduler).

By default in Windows 10 e-cores are heavily favoured, pretty much all single threaded tasks are loaded on to them and p-cores are parked, this even happens if parking is disabled in the power profile. (ultimate performance). park control also cant override this behaviour.

If I adjust the hetergeneous thread scheduling policy, I can manipulate this behaviour, its a hidden setting in windows. Setting it to either "all processors" or "performant" starts letting p-cores to be used, the latter however almost blocks use of e-cores so not ideal if you still want them to be used. But would be a quick and dirty fix e.g. if you want to fire up a single threaded game, it would give you a almost certainty it would use a p-core and not have to worry about affinity settings. Could use with something like 'AutoPowerOptionsOk' to automate the solution. Setting it to all processors would likely require using something like process hacker to get things working in a optimal way with automation so e.g. affinity for svchost and browsers to e-cores and affinity for games to p-cores (good for security as well as e-cores dont have htt). Both of these schedule options still automatically favour the fastest two p-cores for single threaded cinebench which is nice, on my ryzen cpu's this doesnt happen. It also doesnt happen on my 9900k, a reason why I went to all core clock speed on 9900k. But my testing on ryzen and 9900k was done on 1809, whilst on the 13700k was on 21H2, so its possible 1809 has no programming for "favoured cores" as that was introduced later I think.

AMD of course have this problem as well, with some of their processors for different reasons.

I assume the improvements in Windows 11 are just a better default behaviour when specific cpu's are recognised. For better OOB experience.

That sounds pretty counter-intuitive to me. I've never had any issues with CPUs that have core favoring on Windows 10. Let's just say that I'm not keen on heterogenous architectures at all.

#62

chrcoluk

AnotherReaderAs with the P cores, Intel has clocked the E cores too high. Clocking them closer to 3 Ghz would make them true E cores: more efficient than P cores. Chips and Cheese found Gracemont to be more efficient than Golden Cove at a variety of tasks if clock speeds were kept in check. Notably, these more efficient clock speeds were lower than Intel's default for the 12900k.

Well yeah, not saying they are efficient overall, just efficient for intel in terms of silicon size and manufacturing cost.

Right now we have a benchmarks war, notice how close intel and AMD are on the benchmarks? I dont think thats by coincidence.

They ship at the part of the v/f curve to hit the bench performance they want.

#63

dyonoctis

R0H1TYeah nothing to do with being a fanboy, the fact is Intel's in this position in large part due to their own effin greed. They completely deserve what comes their way IMO.

And their foundry not performing as well as it should...I'm not exactly happy about Intel failling to deliver a new process on time since 2015...It just give TSMC more leeway to increase their price as much as they want, since everyone else is struggling to keep up, and at launch AMD was the one asking for a premium

#64

Wirko

DavenTo be fair, I think the Thread director will schedule E-cores for background tasks first regardless of how many threads are needed. But yeah, foreground tasks will be scheduled first to P-cores until more than 8 cores (16 threads?) are needed. I am not sure if the Thread director is smart enough to know when a foreground task doesn’t need the computing might of P-cores and therefore falls back to E-cores to save power.

If set for maximum performance, the scheduler should avoid HT as long as it can, putting E-cores to work instead. Two threads running on one P-core are much slower than one on a P-core (without HT) and the other on an E-core.

KellyNyanbinary“c” for “cloud”. Gotta get the buzzwords in.

Sure, Oracle Corporation knew that full well in 2013 when they released their database version 12c, and c officially stood for cloud. The c was in italics by the way, so it looked faster.

chrcolukWith my experimenting, (albeit on windows 10 which doesnt have intels pre configured scheduler).

By default in Windows 10 e-cores are heavily favoured, pretty much all single threaded tasks are loaded on to them and p-cores are parked, this even happens if parking is disabled in the power profile. (ultimate performance). park control also cant override this behaviour.

I've read that Win 10 scheduler is able to manage cores with different performance (but it isn't aware that not all cores are equally efficient). Your findings seem to confirm that. And it's not surprising; HT isn't new, and wherever you have HT, not all "virtual processors" are equal. For best performance, it's best to prefer one thread per core whenever possible.

#65

Tek-Check

Od1sseas3) I'm not an Intel fanboy, actually I'm not a fanboy at all. My previous CPU was an R5 2600. I just buy whatever is better for the money and in this case the 13600K was the obvious choice. When I bought it (300$) the 7700X was 400$. I got a better CPU for less money.

Nobody cares about this. This article is not about your CPU. Learn something about Zen4 c cores.

#66

Wirko

tabascosauzAt least it's nice for them to finally have a real name. These Zen 4c cores are literally just APU grade Zen 4, jammed into a chiplet. Better than having to call them "reduced-cache Zen" every time to distinguish them.

APU grade when the real APU (7940HS) is able to hit 5.2 GHz? No.
The 4c cores are simplified in ways that will prevent them from reaching high frequencies. That's by design and that's fine.

#67

Tek-Check

R0H1TYeah nothing to do with being a fanboy, the fact is Intel's in this position in large part due to their own effin greed. They completely deserve what comes their way IMO.

Guys, the article is about Zen4 c core. Why so much spam?

rbgcIn fact, AMD’s is also silently swapping some Zen 4 cores with Zen 4c cores in its lower-end 4nm Ryzen 7000U “Phoenix” mobile processors.

What is this about?

#68

R-T-B

OberonCache actually doesn't consume much energy, so it doesn't have a large effect on temps. The bigger contributor to lower temps will be the reduced clockspeed.

Cache is actually pretty energy heavy and can be one of the hotter parts of the chip.

#69

tabascosauz

WirkoAPU grade when the real APU (7940HS) is able to hit 5.2 GHz? No.
The 4c cores are simplified in ways that will prevent them from reaching high frequencies. That's by design and that's fine.

Ah yes...because Ryzen cores are never restricted intentionally by Fmax, and must by law always boost to the absolute limit of their SP :laugh: they may very well be simplified in other ways, but that's neither what the press presentation is conveying nor the focus. The point is no loss of performance, reshuffling or not.

Where did I say that APU-grade connoted shittier quality, or that Phoenix is using 4c cores?

@R-T-B just going off of observable Ryzen behaviour in the past 3 generations and its internal temp monitoring, L3 doesn't usually get hot or draw a lot of power. Whether in 16MB, 32MB or Vcache form.

#70

Wirko

AnotherReaderLast week, TechPowerUp reported on an analysis by SemiAnalysis that went over how AMD made Zen 4c smaller. While it's behind a paywall, the first part covering the physical design is free to read. The core sans the L2 cache is 44% smaller, i.e. nearly half the size of a Zen 4 core. It's an impressive feat of physical design.
TLDR:
reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
lower clock speed target allows denser circuits
The L3 also lacks the arrays of Through-Silicon Vias (TSV) for 3D V-Cache, giving a small area saving. This means that there's no possibility of a stacked L3 cache for Zen 4c.

To add to that, slower transistors = smaller transistors, in general. When a transistor needs to feed many other transistors in the circuit, and/or send signals over (relatively) large distances on the chip, it must overcome a large capacitance. The current that the transistor has to source/sink increases proportionally to both the capacitance and the frequency. So a small transistor can operate up to a certain frequency, and above that, it has to be replaced by a larger one. In chip design, it basically means two or more transistors in parallel.
David Kanter covered that topic (and related stuff) in one of his articles here:
www.realworldtech.com/transistor-count-flawed-metric/

#71

A&P211

MusselsIt's all good, I just loathe intels E-cores because they used a name that is the exact opposite of the product to mislead people about them

They're more efficient at single threaded tasks, and then intel uses them exclusively for multi threaded tasks.
Just... Ugh.

In the mobile arena the e-cores arent very good. Intel processors still get pretty bad battery life.

#72

Wirko

dyonoctisLooking at how Sapphire rapids struggle againt zen3 TR at equal core count while using more power, I'm really not surprised that they are being used in that manner. If RPL is already digusting when it comes to power draw, A 16 P-core i9 might have been uglier to witness on conssumers platforms. A 65w locked 7950x is still faster than golden cove going at 200 watts. (Note that Puget is enforcing PL1 125w and PL2 253w on the core i9 since those are the reference value set by Intel, and it's still faster than the xeon)

While I agree, AnandTech of late isn't the same as AnandTech of old. The power consumption in this graph is what was self-reported by the CPU (or a set limit). They didn't measure real consumption, either AC at the mains plug or DC to the motherboard. But they noted that Intel's watts are too optimistic, and AMD's watts even more so.

#73

doc7000

Od1sseasIntel can pack 4 E-Cores in the same size as 1 P-Core. What about AMD? How many Zen4c cores for one Zen 4 core?

This isn't the same thing as Intel E core, as has been pointed out a number of times including in the headline. Zen 4C beats Intels E core in performance.

AMD Zen4 is roughly half the size of Intels P core and double the size of Intels E core, which is why going with a big/little layout for AMD just doesn't make that much sense (except maybe with laptops). Zen4C is something that was likely in the works for maybe 3-5 years in development.

#74

sLowEnd

Little 'c' = little cache :p

I am pretty surprised at how much space they managed to save by reducing cache though.

#75

Minus Infinity

As someone that does more productivity than gaming, it is IMO a huge mistake for AMD to be sticking to it's lower core counts than Intel on desktop especially now they have a great compact core in the 4c. I would have much preferred a 7850X with 8 Zen 4 and 4 Zen 4c cores any day over a v-cache 7800X3D and it would take the fight to the 13700K. 4 Zen 4c cores would be easily competitive with 8 Gracemont cores. And they would cause less scheduler issues as they'll look like regular cores. That Zen 5 is also sticking to same pattern as Zen 4 is even worse. IMO the performance gap between Zen 5 and Arrow Lake will be even larger for MT.

Add your own comment

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

Related News

153 Comments on AMD Zen 4c Not an E-core, 35% Smaller than Zen 4, but with Identical IPC

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts