Thursday, February 23rd 2023
AMD AGESA 1.0.0.5C AM5 Enables Fine-grained Control Over Ryzen 7000X3D CCD Priority
ASUS began rolling out Beta UEFI firmware updates for its Socket AM5 motherboards encapsulating AGESA 1.0.0.5 patch-C microcode. This exposes several new options to end-users through the UEFI Setup Program, which gives them greater control over the way the processor prioritizes workload among the two CCDs (CPU complex dies) on 12-core and 16-core Ryzen 7000 series processors, including the upcoming 7000X3D processors.
While AMD is working to release Chipset Software updates that include "3D V-cache Optimization driver" components that introduce OS-level awareness of the asymmetric implementation of 3D V-cache on the 7900X3D and 7950X3D where only one of the two CCDs has the additional cache; these firmware-level options give users control on prioritizing one CCD over the other for workload. The firmware-level optimization is OS-agnostic, so pretty much any OS should benefit from 3D V-cache the way it was intended (where less parallelized workloads such as games are prioritized on the CCD with the 3D V-cache.
Source:
HotHardware
While AMD is working to release Chipset Software updates that include "3D V-cache Optimization driver" components that introduce OS-level awareness of the asymmetric implementation of 3D V-cache on the 7900X3D and 7950X3D where only one of the two CCDs has the additional cache; these firmware-level options give users control on prioritizing one CCD over the other for workload. The firmware-level optimization is OS-agnostic, so pretty much any OS should benefit from 3D V-cache the way it was intended (where less parallelized workloads such as games are prioritized on the CCD with the 3D V-cache.
24 Comments on AMD AGESA 1.0.0.5C AM5 Enables Fine-grained Control Over Ryzen 7000X3D CCD Priority
Intel needed OS-level awareness because in Alder/Raptor Lake, there are two processor groups with differing architectures and CPUID, and thread scheduling is hardware-assisted, the processor itself decides the optimal thread for the code to run, so the OS kernel is not aware of this decision unless it is notified by the processor. This is the change introduced in Windows 11.
Meanwhile, Zen 3 and 4's 3D cores are transparent to the OS, they aren't any different and the processor does not have a hardware thread scheduling component like Intel's design, so it needs a driver that is aware of its inefficient topology to optimize scheduling and minimize cross-CCD access latency penalties that the design incurs (since Matisse, adjacent CCX/CCD can access memory on another, though this incurs a severe cycle penalty, this remains true in Vermeer and Raphael).
But as I'm getting older, i should says "more room to mess with"
Do you work for AMD or Microsoft?
There's going to be a lot more information about how the CCDs juggle, and some guides on how to optimize them come next week's review.
It's true that DDR5 will bring some gain, but at the same time, the extra cache will reduce the memory bottleneck on workload that reuse frequently more than 32 mb of cache (but less than let say 150MB.).
The goal of the cache is to feed the core to make sure it stay used as much as possible. In real workload, the execution units of the CPU are never fully used for very long as the CPU have to wait for the data.
Zen 4 IPC gains come from various source, but some of the core themselves got beefier. They can run out of data faster so having faster memory and more cache will help to speed them up.
If you look at the competition, They seems to have toped out the L3 cache at around 36 MB. But they can still manage gen to gen gain even with the same memory.
So the logic should be that Extra cache should add the same performance gain of 5800x -> 5800X3D for the 7700x --> 7800X3D. (as long as the clock difference between both remain the same. If they can have less clock penalty with X3D, the gain should be higher and vice versa if the clock difference is lower).
Because at the end, Even for dual CCD, the goal will be to run cache sensitive game that have a larger than 32 MB datasets on the X3D CCD and isolate it there.
The real-world market is gamers and unless I'm wrong, the 7800X3D is going to match the 7950X3D for gaming at real-world resolutions and settings. Given the additional complexity of juggling threads over two non-heterogenous CCDs, the 7950X may actually be slightly worse at gaming.
There are my guesses and speculation, obviously, so I'm ready to stand corrected once reviews are out.
I suspect that it still could be slightly better (but probably not worth the extra cost) for 2 reason.
1) Games that do not benefits from the extra cache (like really old light games) can be set to run on a CCD that just clock way higher than 7800X3D
2) By isolating the game you run on the X3D CCD (Plus maybe the GPU driver and API stuff), you prevent the cache from being populated by non gaming data. The L3 cache on Zen is a victim cache and will contain all the data evinced from L2.
I don't think we will see a lot of data showing this behavior in benchmark because by nature, they will run it on a clean computer with nothing else open. And this is how they should run it. In the real world, it may give you a slight edge when by example you have multiple monitor and have open browser, youtube, audio playing background etc. But the gain should be fairly small and not worth at all the extra cost.
It will only be a good case for scenario where you want the best of the best and/or scenario where you want to have very nice gaming performance but still need all those core for productivity.
else, just get a 7800X3D
Almost nothing needs the cache the way games do. Productivity software that benefits from more cache benefits even more from extra cores and if you can justify upspending for economic reasons, you can probably justify EPYC for now at least until Zen4 Threadrippers arrive - at which point the 7950X will have near-zero chance of justifying its existence outside of wealthy gamers.
You see, in the original Matisse design used in the Ryzen 9 3950X, we had two CCDs, each containing two CCXs for a total of four and as a result, four 16 MB cache slices, resulting in a 16+16/16+16 configuration. One of the biggest improvements in Vermeer/Zen 3 used in the 5950X was the consolidation of the CCXs into 8-core parts, meaning each CCD contained one CCX with full contiguous access to the L3 cache, and that resulted in a a 32 + 32 MB configuration, which persists on Raphael/Zen 4 used in the 7950X - now the 7950X3D will have a 32 + 96 MB configuration, I think you can see where I am going here and why the 3D V-Cache Optimizer driver is required.
Presumably, AMD did this to keep costs down and/or prevent that these processors threaten their EPYC business, as the AM5 platform should technically be capable of taking a 192 GB memory kit - thus bringing it dangerously close to expensive server components for some applications.
Early rumors are pointing out to the 7950X3D being in a stalemate situation vs. the i9-13900K, and somewhat behind the i9-13900KS in absolute gaming performance, bringing it to a "up to 6% improvement over the 13900K", which sounds about right to me. This probably won't be the miracle CPU many are hoping for due to topology inefficiencies, but it should still be a production powerhouse... gamers should still buy the 5800X3D for their existing AM4 platform or buy the 7800X3D if building AM5, they will be better CPUs overall IMHO.
But I like the market segmentation excuse better, after all, it's not like the performance regressions in the 5800X3D vs. the 5800X, when they existed, caused anyone to lose any sleep over it...