Thursday, December 26th 2024

AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP
Recent engineering samples of AMD's upcoming Ryzen 9 9950X3D reveal what appear to be the finalized specifications of the top-tier AM5 chip. The 16-core, 32-thread processor builds upon the gaming success of the Ryzen 7 9800X3D while addressing its core count limitations. The flagship processor features AMD's refined cache design, combining 96 MB of 3D V-Cache with 32 MB of standard L3 cache. Unlike its predecessor, the 7950X3D, the new Zen 5 architecture incorporates a redesigned CCD stacking method. The CCD now sits above the cache, directly interfacing with the STIM and IHS, eliminating thermal constraints that previously required frequency limitations. The processor features asymmetric cache distribution across its dual CCDs—one die combines 32 MB of base L3 cache with a 64 MB stacked V-Cache layer, while its companion die utilizes a standard 32 MB L3 cache configuration. In total, there is a 128 MB of L3 cache, with 16 MB of L2.
This architectural advancement enables the 9950X3D to achieve a 5.65 GHz boost clock across both CCDs, matching non-X3D variants. The processor maintains a 170 W TDP, suggesting improved thermal efficiency despite the additional cache. AMD's software-based OS scheduler will continue to optimize gaming workloads by directing them to the CCD with 3D V-Cache. Early leaks indicate the 9950X3D matches the base 9950X in Cinebench R23 scores, both in single and multi-threaded tests—a significant improvement over the 7950X3D, which lagged behind its non-X3D counterpart due to frequency limitations. AMD plans to expand the Zen 5 X3D lineup in Q1-2025 with both the 9950X3D and 9900X3D models. Full performance benchmarks and pricing details are expected at CES 2025, where AMD will officially unveil these processors alongside their RDNA 4 GPUs.
Sources:
@94G8LA, via VideoCardz
This architectural advancement enables the 9950X3D to achieve a 5.65 GHz boost clock across both CCDs, matching non-X3D variants. The processor maintains a 170 W TDP, suggesting improved thermal efficiency despite the additional cache. AMD's software-based OS scheduler will continue to optimize gaming workloads by directing them to the CCD with 3D V-Cache. Early leaks indicate the 9950X3D matches the base 9950X in Cinebench R23 scores, both in single and multi-threaded tests—a significant improvement over the 7950X3D, which lagged behind its non-X3D counterpart due to frequency limitations. AMD plans to expand the Zen 5 X3D lineup in Q1-2025 with both the 9950X3D and 9900X3D models. Full performance benchmarks and pricing details are expected at CES 2025, where AMD will officially unveil these processors alongside their RDNA 4 GPUs.
109 Comments on AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP
still will be a kick asz cpu a few years later. I believe in jumping AT LEAST 2 gens to get the best bang for buck
www.techpowerup.com/326709/amd-agesa-1-2-0-2-update-fixes-ryzen-9000-series-inter-core-latency-issues?cp=2
I'm finalizing my 5-year-upgrade build, and am planning to go to Microcenter for the Ryzen 9 9900x build soon. So getting 100% proof of this latency issue being fixed was a big priority for my 9900x vs 9800x3d decision.
80ns is within the realm of P-core to P-core on the Intel Ultra 7 265k. I do think the Intel Ultra 7 is underrated but I'm too much of an AVX512 fanboy so Zen5 wins me over.
Chips-and-cheese core-to-core latency graphs of Arrow Lake: chipsandcheese.com/p/examining-intels-arrow-lake-at-the
I linked TechPowerup earlier, but here's Chips and Cheese's tests as well:
You were correct at launch. The issue is that AMD released new microcode recently that fixed the 200-to-400 nanosecond latencies and pushed it all the way down to 80 or less.
Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.
For Zen 5 it is most likely improved further:It's not like AMD hasn't thought about doing X3D on more than one chiplet, in fact they are selling EPYCs like 9684X with 12 CCDs and 1152MB L3.
Edit: Zen 5 X3D latency penalty is about the same:
Databases etc can have completely seperate threads that do no interact with the master except at point of creation and completion so they miss all the "regular" penalties of inter CCD communication.
That doesn't necessarily say that they spend the same time at that speed under all conditions.
Anyway, I want one.
Die-to-die latencies do exist of course but on a scale far smaller than you might imagine. 200ns+ is server-grade equipment latencies, not something I'd expect to see on a desktop system. And that's because server-grade systems have more RAM, more RAM Controllers, more dies and more caches that need to communicate. So everything slows down.
---------
Anyway, 200ns latencies for an on-package SRAM makes no sense. That's slower than DRAM (!!!!) like DDR5 technologies. SRAM always had much smaller latencies than that, and I expect that the x3d caches are made out of the faster SRAM and not the slower DRAM. (also: logic companies like AMD/TSMC can make SRAM more easily than DRAM. DRAM is actually very difficult to make on these processes)
Knowing that frequency and thermals are similar, the probability that boosting characteristics will be different enough to make a notable difference in games in near zero.
People, learn how to context.
What is more your 550ns figure is over twice the time that one EPYC Turin core takes to communicate with a core in another socket. Please explain how you arrived at this figure, and what is the "context" here.
Edit: seems that the context here is trolling, but it's OK - I've refreshed my knowledge a bit by researching this.
For completeness here's the L3 latency plot for an EPYC Milan-X with 8 X3D CCDs where going to another X3D slice has a penalty but overall keeps below DRAM latency:
chipsandcheese.com/p/pushing-amds-infinity-fabric-to-its
Zen 4 has a hardware limitation that a dual X3D setup would have been absolutly HORRENDOUS in performance as accessing the 2nd CCDs cache would have been only as fast as accessing DRAM in certain worst case scenarios and can very easily see 2-3 times the latency penalty rising to nearly 10 times in the worst case. I suspect Zen 3/5xxx series parts would have seen similar issues due to the design of the IO Die etc
Zen 5 has seemingly fixed this issue as well as having the high clock speeds due to the relocated X3D. I wonder if we AMD are holding back dual X3D parts in case Intel pulls something out of the bag ala Nvidias origianl Ti/Super variants of a few years ago? I mean the Single CCD parts are completely handing Intel the L in gaming by quite a margin currently.
Also are they trying to prevent confusion as the dual x3d parts would segregate the market even futher again as you now have 3 different SKUs for each core count and with desktop parts probably pushing up towards the $/£1k mark again for the top end non HEDT part. How much would it cut into their lower end HEDT/Workstation sales.
For Blender, 9800X3D beats 9700X, and nearly rivals 16 cores 5950X and 12 cores 7900. 9800X3D's SMT is strong relative to Zen 3's SMT.
Also, this isn’t second gen X3D, but third. And AMD had the fully production ready sample of dual 3DCCD 5950X3D, back in the day, when their 3D-VCache only emerged. There were other reasons.
The Zen4 was perfectly scalable, at any wattage/power/thermal envelope. Zen5 X3D, seems to be as good. There's no frequency limits for it, and it works as fast as non-X3D parts.
At this point, non-3D parts have become, the "dietetic", budget oriented/cut-down version. And AMD themselves have created this image.
And there's absolutely no exuse, for 9950X3D to not be dual 3D-CCD. The technology allows this, the cost is already high, and the 3D dies are now not limited either by frequency, or power.
Just my thoughts!