Tuesday, August 30th 2022

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

Aug 30th, 2022 04:17 Discuss (41 Comments)

As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.

Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.

The Zen 4 CCD is slightly smaller than the Zen 3 CCD despite the higher transistor-counts, thanks to the switch to 5 nm (TSMC N5 process). The CCD measures 70 mm², in comparison to the 83 mm² "Zen 3" CCD. The transistor-count of the "Zen 4" CCD is 6.57 billion, a whopping 58 percent increase from that of the "Zen 3" CCD and its 4.15 billion transistor-count.

The cIOD (client I/O die) sees a big chunk of innovation. It's built on the 6 nm (TSMC N6) node, which is a big leap from the GlobalFoundries 12 nm node that the cIOD of Ryzen 5000 series processors were made on. It also incorporates certain power-management features from the Ryzen 6000 "Rembrandt" processors. This cIOD packs an iGPU based on the RDNA2 graphics architecture, besides the DDR5 memory controllers, and a PCI-Express Gen 5 root complex. The new 6 nm cIOD measures 124.7 mm², compared to the slightly larger 124.9 mm² cIOD of the Ryzen 5000 series.

The "Raphael" multi-chip module has one CCD for the 6-core and 8-core SKUs, and two CCDs for the 12-core and 16-core SKUs. "Raphael" is built in the Socket AM5 package. AMD is rumored to be readying a thin BGA package of "Raphael" for high-performance notebook platforms, which it's codenamed "Dragon Range." These processors will come in various 45 W, 55 W, and 65 W TDP points, powering high-end gaming notebooks.

Sources: Chiakokhua (Twitter), Skyjuice (Twitter), Skyjuice (Angstronomics)

Add your own comment

41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

Gungar

58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.

ModEl4

Didn't Mark Papermaster said also that Zen4c core is around half the area of Zen4 core? (4nm vs 5nm comparison) Also that the architecture is optimised for lower frequency and with higher efficiency vs Zen4.
I'm really curious what gaming performance difference Zen4c will have vs Zen3+
Although AMD slides are showing Strix Point with Zen5 cores my first thought was Zen5c due to mobile segment.

Wirko

btarunrthe company has enlarged the micro-op cache of the core from 4 KB to 6.75 KB

It's probably the number of entries, not bytes. Same goes for BTB.
It would be interesting to know the sizes of various internal data structures in bytes/bits, though.

Daven

ModEl4Didn't Mark Papermaster said also that Zen4c core is around half the area of Zen4 core? (4nm vs 5nm comparison) Also that the architecture is optimised for lower frequency and with higher efficiency vs Zen4.
I'm really curious what gaming performance difference Zen4c will have vs Zen3+
Although AMD slides are showing Strix Point with Zen5 cores my first thought was Zen5c due to mobile segment.

Zen 4c is Epyc only. Not meant for gaming at all. Its for cloud instances. Not everything made by humans is for gaming.

Oberon

Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.

A significant chunk of that went into implementing support for AVX-512. In workloads that actually make use of those transistors, the performance increase will average many times that 13% number.

HD64G

OberonA significant chunk of that went into implementing support for AVX-512. In workloads that actually make use of those transistors, the performance increase will average many times that 13% number.

Indeed, they showed 2,5X performance vs Zen3 in such a case.

LuxZg

Well, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die

Oberon

LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die

At least you can use those transistors when necessary, unlike with Alder Lake...

You also have to remember that part of AMD's product strategy is to reuse the same CCD across the majority of their desktop and server SKUs, so there will be some tradeoffs for value in one segment or another (and server/HPC gets priority since it brings in more money.)

ncrs

LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die

Don't forget that the primary focus of AMD is not the desktop, but server markets. Sharing one chiplet design between many markets was always a strength of Zen.
For servers/workstaitons AVX-512 is a welcome addition. It will be interesting to see if laptop Zen4 will keep it as well.
That silicon most likely is also usable for non-AVX-512 tasks due to register renaming/reuse and similar modern CPU optimizations.

#10

Niarod

So integrated graphics is confirmed? I didn't hear anyone from AMD mention it during the presentation..

#11

The_Enigma

LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die

Avx512 is more than just increasing register width for a new instruction though. It brings tons of improvements to the entire AVX lineup of instructions and can accelerate AVX and AVX2 even more with no clock speed penalty to those older instructions using the newer features.

Additionally, it greatly speeds up emulation. Not just PS2 emulator, but also ARM emulation. Someone posted on another site that the new features in Avx512 allow them to cut the instructions needed to do certain parts of the ARM emulation down anywhere from 5-10x

So it does have good uses even today, uses you probably use some of without realizing it, but it also sets this gen as a baseline for support going into the future. As time goes on more software will make use of them and that has to have hardware support at some point getting to the masses to drive the software adoption.

#12

LuxZg

Yeah, you're both right, I disregarded server market... Still seems sad ror us consumers :-( Now I wonder if Zen4c is actually that lart without AVX-512..

#13

ncrs

LuxZgYeah, you're both right, I disregarded server market... Still seems sad ror us consumers :-( Now I wonder if Zen4c is actually that lart without AVX-512..

Rumor has it that it's just limited cache sizes. I don't think it would make sense to cut AVX-512 from a part specifically tailored to cloud vendors.

#14

Punkenjoy

Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.

AVX indeed increase the number of transistor, but there are many others needs. There is also the law of diminishing returns that every cpu vendor have to fight. You have to throw more and more transistors at a problem to increase performance. But anyway, there was never a 1 to 1 match.

Also, AMD probably added a bunch of stage to increase the clock frequency.

It also maybe look worst on paper because of the CCD. On a monolithic CPU, the uncore parts grow way slower so in the end it hide the real transistors growth of the cores and caches.

Like people said, at least AMD can use AVX512. It's quite stupid that Intel ship it with but it have to be disabled because of the damn e-cores. IF at least the E-cores could run AVX-512 codes but slower, that would make more sense. I wonder how those CPU would perform if they were able to use the area used by AVX512 for something else.

#15

Oberon

I would bet my life that Zen 4c is extension-compatible with Zen 4 (to avoid ADL-like issues.)

#16

TheLostSwede

News Editor

NiarodSo integrated graphics is confirmed? I didn't hear anyone from AMD mention it during the presentation..

Yes it is, it's on the AMD spec pages.

#17

AnotherReader

Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.

Performance isn't solely about IPC. They have increased clocks by 13% too. That is a 27% performance increase. Pollack's rule states that performance for a single core increases by the square root of its proportional area. Square root of 1.58 is 1.26 which is very close to the performance increase from 7950X to the 5950X. To summarize, the extra transistors probably went to:

AVX-512
increased clock speeds
larger front-end
larger L2

I wouldn't rule out future IPC increases; Zen 4 is an incremental update from Zen 3, reusing the same microarchitecture. If they run out of steam for Zen 5, then we can say that they are running out of tricks.

#18

DemonicRyzen666

Slow avx 512, dual load 256 avx, again amd didn't learn their lesson with see4.1 and avx on bulldozer. There is a ton of problems with that implementation.

#19

Eternalightwithin

Would it have been better to implement a full size AVX512 and combine 2 AVX256 instructions? Or separate modules for 512 and 256. That would increase die size more though

#20

defaultluser

EternalightwithinWould it have been better to implement a full size AVX512 and combine 2 AVX256 instructions? Or separate modules for 512 and 256. That would increase die size more though

This will seed software devs to continue supporting Intel's now dead vector arch; it will have to wait for a future die rev before AMD doubles execution width agan (remember zen 2?)

#21

ModEl4

DavenZen 4c is Epyc only.

So we watched the same event?

DavenNot meant for gaming at all. Its for cloud instances. Not everything made by humans is for gaming.

wtf is this?
Did i say it was designed for gaming or that everything made by humans is for gaming?
Goldmont core (Gemini Lake) or Tremont core (Jasper Lake) for example was also not designed for gaming and many people that were interested for low cost platforms was curious about this lower core performance vs regular skylake.

#22

thegnome

Hopefully for Zen 5 L3 will get a boost again wtihout having to rely on V-Cache (they are still seperate for Zen 5), and having 3 chiplets. 1 chiplet models for budget like the current 6 and 8 core, but also a second Zen 5c chiplet for 4/8/16 e-cores (like on intel) at the midrange, with the full dual ccd models also having the Zen 5c chiplet. Would make AMD much more competitive in terms of core count again without having to raise the regular CCD's core count.

#23

defaultluser

thegnomeHopefully for Zen 5 L3 will get a boost again wtihout having to rely on V-Cache (they are still seperate for Zen 5), and having 3 chiplets. 1 chiplet models for budget like the current 6 and 8 core, but also a second Zen 5c chiplet for 4/8/16 e-cores (like on intel) at the midrange, with the full dual ccd models also having the Zen 5c chiplet. Would make AMD much more competitive in terms of core count again without having to raise the regular CCD's core count.

they need to bump core count if they want to compete with whatever succeeds raptor lake - maybe they can handle 12 cores per chiplet?

#24

thegnome

defaultluserthey need to bump core count if they want to compete with whatever succeeds raptor lake - maybe they can handle 12 cores per chiplet?

True, but I wouldn't mind seeing it either way. I suppose they could always just chonk a third regular ccd on there.

#25

Dirt Chip

Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.

IGP and AVX takes it tall.
It will get better in zen4+ and zen5 for sure

Add your own comment

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

Related News

41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts