Thursday, January 5th 2023
AMD Confirms Ryzen 9 7950X3D and 7900X3D Feature 3DV Cache on Only One of the Two Chiplets
AMD today announced its new Ryzen 7000X3D high-end desktop processors to much fanfare, with availability slated for February 2023, you can read all about them in our older article. In our coverage, we noticed something odd about the cache sizes of the 12-core 7900X3D and 16-core 7950X3D. Whereas the 8-core, single-CCD 7800X3D comes with 104 MB of total cache (L2+L3), which works out to 1 MB L2 cache per core and 96 MB of L3 cache (32 MB on-die + 64 MB stacked 3DV cache); the dual-CCD 7900X3D and 7950X3D was shown with total caches of 140 MB and 144 MB, while they should have been 204 MB or 208 MB, respectively.
In our older article, we explored two possibilities—one that the 3DV cache is available on both CCDs but halved in size for whatever reason; and the second more outlandish possibility that only one of the two CCDs has stacked 3DV cache, while the other is a normal planar CCD with just the on-die 32 MB L3 cache. As it turns out, the latter theory is right! AMD put out high-resolution renders of the dual-CCD 7000X3D processors, where only one of the two CCDs is shown having the L3D (L3 cache die) stacked on top. Even real-world pictures of the older "Zen 3" 3DV cache CCDs from the 5800X3D or EPYC "Milan-X" processors show CCDs with 3DV caches having a distinct appearance with dividing lines between the L3D and the structural substrates over the regions of the CCD that have the CPU cores. In these renders, we see these lines drawn on only one of the two CCDs.It shouldn't be hard for such an asymmetric cache setup to work in the real world from a software perspective, given that we are now firmly in the era of hybrid-core processors thanks to Intel and Arm. Even way before "Alder Lake," when AMD started shipping dual-CCD client processors with the Ryzen 3000 "Matisse" based on "Zen 2," the company closely collaborated with Microsoft to optimize OS scheduling such that high-performance and less-parallelized workloads such as games, are localized to just one of the two CCDs, to minimize DDR4 memory roundtrips.
Even before "Matisse," AMD and Microsoft confronted multi-threaded workload optimization challenges with dual-CCX architectures such as "Zen" and "Zen 2," where the OS scheduler would ideally want to localize gaming workload to a single CCX before saturating both CCXs on a single CCD, and then onward to the next CCD. This is achieved using methods such as CPPC2 preferred-core flagging, and which is why AMD highly recommends you to use their "Ryzen Balanced" Windows power-plan included with their Chipset drivers.
We predict that something similar is happening with the 12-core and 16-core 7000X3D processors—where gaming workloads can benefit from being localized to the 3DV cache-enabled CCD, and any spillover workloads (such as audio stack, network stack, background services, etc) are handled by the second CCD. In non-gaming workloads that scale across all 16 cores, the processor works like any other multi-core chip, it's just that the cores in the 3DV-enabled CCD have better performance from the larger victim cache. There shouldn't be any runtime errors arising from ISA mismatch, as the CPU core types on both CCDs are the same "Zen 4."
AMD Ryzen 7000X3D processors go on sale in February 2023.
In our older article, we explored two possibilities—one that the 3DV cache is available on both CCDs but halved in size for whatever reason; and the second more outlandish possibility that only one of the two CCDs has stacked 3DV cache, while the other is a normal planar CCD with just the on-die 32 MB L3 cache. As it turns out, the latter theory is right! AMD put out high-resolution renders of the dual-CCD 7000X3D processors, where only one of the two CCDs is shown having the L3D (L3 cache die) stacked on top. Even real-world pictures of the older "Zen 3" 3DV cache CCDs from the 5800X3D or EPYC "Milan-X" processors show CCDs with 3DV caches having a distinct appearance with dividing lines between the L3D and the structural substrates over the regions of the CCD that have the CPU cores. In these renders, we see these lines drawn on only one of the two CCDs.It shouldn't be hard for such an asymmetric cache setup to work in the real world from a software perspective, given that we are now firmly in the era of hybrid-core processors thanks to Intel and Arm. Even way before "Alder Lake," when AMD started shipping dual-CCD client processors with the Ryzen 3000 "Matisse" based on "Zen 2," the company closely collaborated with Microsoft to optimize OS scheduling such that high-performance and less-parallelized workloads such as games, are localized to just one of the two CCDs, to minimize DDR4 memory roundtrips.
Even before "Matisse," AMD and Microsoft confronted multi-threaded workload optimization challenges with dual-CCX architectures such as "Zen" and "Zen 2," where the OS scheduler would ideally want to localize gaming workload to a single CCX before saturating both CCXs on a single CCD, and then onward to the next CCD. This is achieved using methods such as CPPC2 preferred-core flagging, and which is why AMD highly recommends you to use their "Ryzen Balanced" Windows power-plan included with their Chipset drivers.
We predict that something similar is happening with the 12-core and 16-core 7000X3D processors—where gaming workloads can benefit from being localized to the 3DV cache-enabled CCD, and any spillover workloads (such as audio stack, network stack, background services, etc) are handled by the second CCD. In non-gaming workloads that scale across all 16 cores, the processor works like any other multi-core chip, it's just that the cores in the 3DV-enabled CCD have better performance from the larger victim cache. There shouldn't be any runtime errors arising from ISA mismatch, as the CPU core types on both CCDs are the same "Zen 4."
AMD Ryzen 7000X3D processors go on sale in February 2023.
164 Comments on AMD Confirms Ryzen 9 7950X3D and 7900X3D Feature 3DV Cache on Only One of the Two Chiplets
First of all, in 2008 most games were NOT "multithreaded". Most games primarily used 1 core, and offloaded some stuff onto a second core. Dual core Athlons and core 2 duos regularly outperformed quad core phenoms and core 2 quads. Go look back at some benchmarks and refresh your memory.
Today, again, "everything is multithreaded" is wrong. Not all games regularly push multiple cores, and of the (admittedly many) that do, they do not push past 6 cores. Usually only 1-2 cores are heavily hit, the others not so much. They're used, but not to the extent that they are heavily utilized like cores 0-1. Games hardly take advantage of 8 cores, let alone 12-16 cores.
There is no reason to have a 3d cache on both chiplets, games do not use that many cores and likely will not or the foreseeable future. Games are not inherently multi threaded and there is a limit to how much you can split up.
Second CCD looks like it will also have some silicon spacer on top to bring it to the same height as the cache die.Temps will be interesting.
Edit: First die with 3D cache gets thinned down to original height so a spacer shouldnt be needed.
More than 8 threads are useless in gaming due to dependency chains
Gaming performance should be relatively equal between these 3 SKUs then. The 7900/7950 3D variants will retain some of their better productivity performance with 1 CCD still being able to hit the non 3D variant boost clocks. So expect lower productivity performance unless any programs take advantage of the large cache.
At first glance it looks like there’s still a similar limitation to the Gen 1 3D design limiting clock speeds.
Unless you have stock in either AMD or Intel or nVidia, you want all of their products to be within 20% of each other or you get an Intel and nVidia situation.
www.techpowerup.com/review/rtx-4090-53-games-core-i9-13900k-vs-ryzen-7-5800x3d/2.html
I know for a fact that my 13900k at stock with 7600c34 ram is 15% faster than a maxed out 12900k running at 5.4ghz all core. I also know that said maxed out 12900k is faster than the 5800x 3d. Therefore it's obvious that the difference between the 13900k and the 5800x 3d cannot be 6% when fully cpu bound.
Edit: IMO, it would be much simpler if we had xx00 numbers for normal CPUs, and xx50 for chips with the 3D cache. x600 could be 6-core, x700 8-core, x800 10 or 12-core and x900 16-core, and the xx50 would be the X3D.
I can't wait for TPU and HU to go through their detailed breakdown of this CPU. 7950x3D is what I've been waiting for since the 5800x3D announcement. Its for the work/play setups. Do some work...and run some games on breaks. The job won't even know. :pimp::pimp::pimp: