Tuesday, August 30th 2022
AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed
As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.
Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.The Zen 4 CCD is slightly smaller than the Zen 3 CCD despite the higher transistor-counts, thanks to the switch to 5 nm (TSMC N5 process). The CCD measures 70 mm², in comparison to the 83 mm² "Zen 3" CCD. The transistor-count of the "Zen 4" CCD is 6.57 billion, a whopping 58 percent increase from that of the "Zen 3" CCD and its 4.15 billion transistor-count.
The cIOD (client I/O die) sees a big chunk of innovation. It's built on the 6 nm (TSMC N6) node, which is a big leap from the GlobalFoundries 12 nm node that the cIOD of Ryzen 5000 series processors were made on. It also incorporates certain power-management features from the Ryzen 6000 "Rembrandt" processors. This cIOD packs an iGPU based on the RDNA2 graphics architecture, besides the DDR5 memory controllers, and a PCI-Express Gen 5 root complex. The new 6 nm cIOD measures 124.7 mm², compared to the slightly larger 124.9 mm² cIOD of the Ryzen 5000 series.
The "Raphael" multi-chip module has one CCD for the 6-core and 8-core SKUs, and two CCDs for the 12-core and 16-core SKUs. "Raphael" is built in the Socket AM5 package. AMD is rumored to be readying a thin BGA package of "Raphael" for high-performance notebook platforms, which it's codenamed "Dragon Range." These processors will come in various 45 W, 55 W, and 65 W TDP points, powering high-end gaming notebooks.
Sources:
Chiakokhua (Twitter), Skyjuice (Twitter), Skyjuice (Angstronomics)
Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.The Zen 4 CCD is slightly smaller than the Zen 3 CCD despite the higher transistor-counts, thanks to the switch to 5 nm (TSMC N5 process). The CCD measures 70 mm², in comparison to the 83 mm² "Zen 3" CCD. The transistor-count of the "Zen 4" CCD is 6.57 billion, a whopping 58 percent increase from that of the "Zen 3" CCD and its 4.15 billion transistor-count.
The cIOD (client I/O die) sees a big chunk of innovation. It's built on the 6 nm (TSMC N6) node, which is a big leap from the GlobalFoundries 12 nm node that the cIOD of Ryzen 5000 series processors were made on. It also incorporates certain power-management features from the Ryzen 6000 "Rembrandt" processors. This cIOD packs an iGPU based on the RDNA2 graphics architecture, besides the DDR5 memory controllers, and a PCI-Express Gen 5 root complex. The new 6 nm cIOD measures 124.7 mm², compared to the slightly larger 124.9 mm² cIOD of the Ryzen 5000 series.
The "Raphael" multi-chip module has one CCD for the 6-core and 8-core SKUs, and two CCDs for the 12-core and 16-core SKUs. "Raphael" is built in the Socket AM5 package. AMD is rumored to be readying a thin BGA package of "Raphael" for high-performance notebook platforms, which it's codenamed "Dragon Range." These processors will come in various 45 W, 55 W, and 65 W TDP points, powering high-end gaming notebooks.
41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed
I'm really curious what gaming performance difference Zen4c will have vs Zen3+
Although AMD slides are showing Strix Point with Zen5 cores my first thought was Zen5c due to mobile segment.
It would be interesting to know the sizes of various internal data structures in bytes/bits, though.
You also have to remember that part of AMD's product strategy is to reuse the same CCD across the majority of their desktop and server SKUs, so there will be some tradeoffs for value in one segment or another (and server/HPC gets priority since it brings in more money.)
For servers/workstaitons AVX-512 is a welcome addition. It will be interesting to see if laptop Zen4 will keep it as well.
That silicon most likely is also usable for non-AVX-512 tasks due to register renaming/reuse and similar modern CPU optimizations.
Additionally, it greatly speeds up emulation. Not just PS2 emulator, but also ARM emulation. Someone posted on another site that the new features in Avx512 allow them to cut the instructions needed to do certain parts of the ARM emulation down anywhere from 5-10x
So it does have good uses even today, uses you probably use some of without realizing it, but it also sets this gen as a baseline for support going into the future. As time goes on more software will make use of them and that has to have hardware support at some point getting to the masses to drive the software adoption.
Also, AMD probably added a bunch of stage to increase the clock frequency.
It also maybe look worst on paper because of the CCD. On a monolithic CPU, the uncore parts grow way slower so in the end it hide the real transistors growth of the cores and caches.
Like people said, at least AMD can use AVX512. It's quite stupid that Intel ship it with but it have to be disabled because of the damn e-cores. IF at least the E-cores could run AVX-512 codes but slower, that would make more sense. I wonder how those CPU would perform if they were able to use the area used by AVX512 for something else.
- AVX-512
- increased clock speeds
- larger front-end
- larger L2
I wouldn't rule out future IPC increases; Zen 4 is an incremental update from Zen 3, reusing the same microarchitecture. If they run out of steam for Zen 5, then we can say that they are running out of tricks.Did i say it was designed for gaming or that everything made by humans is for gaming?
Goldmont core (Gemini Lake) or Tremont core (Jasper Lake) for example was also not designed for gaming and many people that were interested for low cost platforms was curious about this lower core performance vs regular skylake.
It will get better in zen4+ and zen5 for sure