Wednesday, January 5th 2022
AMD Readying 16-core "Zen 4" CCDs Exclusively for the Client Segment with an Answer to Intel E-cores?
AMD already declared the CPU core counts of its EPYC "Genoa" and "Bergamo" processors to top out at 96 and 128, respectively, a core-count believed to have been facilitated by the larger fiberglass substrate of the next-gen SP5 CPU socket, letting AMD add more 8-core "Zen 4" chiplets, dubbed CPU complex dies (CCDs). Until now, AMD has used the chiplet as a common component between its EPYC enterprise and Ryzen desktop processors, to differentiate CPU core counts.
A fascinating theory that hit the rumor-mill, indicates that the company might leverage 5 nm (TSMC N5) carve out larger CCDs with up to 16 "Zen 4" CPU cores. Half of these cores are capped at a much lower power budget, essentially making them efficient-cores. This is a concept AMD appears to be carrying over from its 15-Watt class mobile processors, which see the CPU cores operate under an aggressive power-management. These cores still turn out a reasonable amount of performance, and are functionally identical to the ones on 105 W desktop processors with a relaxed power budget.Since the "fat" and "slim" cores are functionally identical to each other; AMD need not develop a complex middleware like the Intel Thread Director, and can make do with OS scheduler-level optimizations that it can co-develop with Microsoft or the Linux community, much like it did for older versions of the "Zen" microarchitecture that featured multiple CCXs.
The theory also predicts that AMD might build on the 3D Vertical Cache technology. The next-gen CCD might feature two layers, the bottom layer with CPU cores and their dedicated L2 caches; and a top layer exclusively for a 64 MB 3D Vertical Cache serving as a shared L3 cache. In the "Zen 3" 3DV Cache CCD, the 64 MB SRAM is located above the region of the CCD that typically has its 32 MB L3 cache, a relatively cooler component than the CPU cores. On the new CCD, this SRAM could be located over the region that has the low-TDP cores, pushing the high-TDP "performance" cores to the periphery of the die, with structural silicon conducting heat from these cores to the surface.
This theory is way out there, but it's plausible because AMD doesn't have a formidable low-power CPU core architecture to rival "Gracemont." and because Intel's next-gen "Raptor Lake" chips are rumored to see the addition of more E-core clusters, making the "i9-13900K" a 24-core processor, beating AMD in the core-count game. If we were to nitpick, we'd point out that the low-TDP cores take as much valuable die real-estate and transistor-count as the high-TDP cores; and die-size (i.e. wafer volumes) are a rather scarce resource these days. We'll find out in the second half of 2022.Many thanks to TheoneandonlyMrK for the tip
Source:
Wccftech
A fascinating theory that hit the rumor-mill, indicates that the company might leverage 5 nm (TSMC N5) carve out larger CCDs with up to 16 "Zen 4" CPU cores. Half of these cores are capped at a much lower power budget, essentially making them efficient-cores. This is a concept AMD appears to be carrying over from its 15-Watt class mobile processors, which see the CPU cores operate under an aggressive power-management. These cores still turn out a reasonable amount of performance, and are functionally identical to the ones on 105 W desktop processors with a relaxed power budget.Since the "fat" and "slim" cores are functionally identical to each other; AMD need not develop a complex middleware like the Intel Thread Director, and can make do with OS scheduler-level optimizations that it can co-develop with Microsoft or the Linux community, much like it did for older versions of the "Zen" microarchitecture that featured multiple CCXs.
The theory also predicts that AMD might build on the 3D Vertical Cache technology. The next-gen CCD might feature two layers, the bottom layer with CPU cores and their dedicated L2 caches; and a top layer exclusively for a 64 MB 3D Vertical Cache serving as a shared L3 cache. In the "Zen 3" 3DV Cache CCD, the 64 MB SRAM is located above the region of the CCD that typically has its 32 MB L3 cache, a relatively cooler component than the CPU cores. On the new CCD, this SRAM could be located over the region that has the low-TDP cores, pushing the high-TDP "performance" cores to the periphery of the die, with structural silicon conducting heat from these cores to the surface.
This theory is way out there, but it's plausible because AMD doesn't have a formidable low-power CPU core architecture to rival "Gracemont." and because Intel's next-gen "Raptor Lake" chips are rumored to see the addition of more E-core clusters, making the "i9-13900K" a 24-core processor, beating AMD in the core-count game. If we were to nitpick, we'd point out that the low-TDP cores take as much valuable die real-estate and transistor-count as the high-TDP cores; and die-size (i.e. wafer volumes) are a rather scarce resource these days. We'll find out in the second half of 2022.Many thanks to TheoneandonlyMrK for the tip
41 Comments on AMD Readying 16-core "Zen 4" CCDs Exclusively for the Client Segment with an Answer to Intel E-cores?
Presumably the only things stopping AMD from making a 16P+16E product into a 32P product will be the power limits and cooling to such a small socket, as well as the desire not to cannibalise their Threadripper sales.
It was meant to be paired with the Zen 5 big core in the next generation design that borrows the efficiency core idea.
It may be that it will be ready far sooner than the Zen 5 core, and can be used in a high-end Zen 4 CCD utilising 3D cache (perhaps not at Zen 4 launch) in an 8+8 configuration.
Power optimised (lower max clock) cores can use smaller transistors, less L1 cache, perhaps even thinner vector units, to save die space.
Additionally, TSMC N5 is 1.8x denser for logic than N7/N6, but only 1.2x denser for SRAM (cache), so moving that onto a stacked N7 3D cache die makes a huge amount of sense, freeing up a lot of die space for slightly space/power optimised efficiency cores.
When you consider existing CPUs, we already see that the max all-core clock is far lower (primarily for power reasons) than the single/dual core turbo. So why have every core in the CCD able to reach that single core turbo speed? Once you realise this, AMD's plan makes a lot of sense. Intel's less so, as their efficiency cores are different designs, and have disabled AVX512 as a result in the big cores.
I am still not sold on big little though, better power gating and frequency control should to me, logically make them pointless.
We will see though.
@btarunr one point though, it's a schematic , it doesn't show die size's, it's possible there's a difference in size, and features still technically, I'm sure they shaved some excess off but then again those low power cores beat a 5800X allegedly so who knows.
Good times though as someone else said,. Competition, fantastic.
And the Zen4c in Bergamo exists only because the smaller area and the target market wich has workload that doesnt need big caches.
Efficiency-wise the "normal" Zen4 can go to very low TDPs AND to high frequency, no need for Zen4c cores here, because there are virtually no area restrictions for those plattforms in its segments.
I think AMD could use Zen4c cores in SoCs like 5G stations intel targets with its 24-core Tremont and Gracemont offerings.
If you sell a enterprise part, and all these were in enterprise parts , your obliged to support that part for sometimes 10+ years.
Plus I think they're upto at least 6 concurrent core designs in production now soo.
Seems weird to me though. They cant pull off full speed cores with balanced power so they wrap their CPUs in a nice big.little scheme and go with it.
I don't know but that is my impression.
Everything was behind, b/c no money thanks to a big blue criminal.
www.newegg.com/intel-core-i7-12700f-core-i7-12th-gen/p/N82E16819118359
Intel Core i7-12700F $329.99
www.newegg.com/amd-ryzen-7-5800x/p/N82E16819113665
AMD Ryzen 7 5800X $378.98
These work because the software development must adjust. With monopoly for a product everyone has to adjust to the change either good or bad. You think with duopoly it's different?
The cores will perform worse to save power but there will be more of them. I'm just point it out not saying this will not work. It will work but the point still stands. They will offer less for the same price cause I doubt the prices will go down for those chips.
The idea is that you dont need a ferrari engine to do your grocery's. A simple one litre 3 cilinder engine could do the same task but way more efficient.
But i prefer all fast cores instead of that half mixed up stuff. If power is an issue hit the power saving feature and done.
so in regard of the actual concurrently produced die diversity, of course you are right, too.
nevertheless the ~10 year support/supply of these dies makes a decision on wich design to produce necessary and thus these will take a percentage of the available production lines at TSMC or elsewhere.
so, yes i think 1 die of each would be sufficient, Zen4, Zen4c, and some monolithic big APU and one small like Athlon 300 maybe.
The picture you show, where the whole L3 (maybe? including L3-Control, L3-Tags and L2-ShadowTags) are only in the V-Cache-Layer. I find that problematic.
And the area the Zen4c cores take in your picture, again seem not necessary to me in the layout you chose,
because in the Zen3 layout below full equally big cores are placeable easily where the L3 sits normally.
So for me it makes nearly no sense area wise or efficiency wise.
But obviously i could be be totally wrong of course.
additonally i think of a possible approach of a 5950X successor containing 1 CCD housing one Zen4 8-core complex, PLUS 1 Bergamo-style CCD housing 16 Zen4c cores.
making it a dual CCD AM5 CPU with 24 Cores and 48 Threads
best thing of that approarch would be that no "special hybrid CCD" is needed
I just passed a link on.
The L3 cache is not on the chip it's a vcache slice placed on top apparently but it's all rumours though I get your points, in all honesty I expected what your suggestions stated before this leak but I think it's possible this could work well.
I do still prefer just big guns(core's) and no knives(Ecores)personally.
With V-Cache as we know it now, AMD can choose between two variants: a "normal" one-layer package and a "super-cached" two-layer package, considerably more expensive and with a thermal tradeoff. This affords them quite a lot of flexibility. I'm sure each of them will find its place in servers and HPC clusters.
Now with this new proposed configuration, only the latter can exist. That's unless AMD also puts a part of L3 cache, certainly slower, on the I/O die.