Friday, September 27th 2024

AMD Ryzen 9 9950X3D and 9900X3D to Feature 3D V-cache on Both CCD Chiplets

Earlier this week, we got rumors that AMD is rushing in the Ryzen 7 9800X3D 8-core/16-thread "Zen 5" processor with 3D V-cache for a late-October debut. The 9800X3D succeeds the popular 7800X3D, and AMD probably hopes it will have a competitive gaming processor in time for Intel's Core Ultra 2-series "Arrow Lake-S" launch. In the previous article, it was reported that the higher core-count 9000X3D series processor models, the Ryzen 9 9950X3D and Ryzen 9 9900X3D, would arrive some time in Q1 2025, because it was reported that the chips have certain "new features" compared to their predecessors, the 7950X3D and 7900X3D. At the time, we even explored the possibility of AMD giving both 8-core CCDs on the processor 3D V-cache. Turns out, this is where things are headed.

A new report by Benchlife.info claims that the higher core-count 9950X3D and 9900X3D will implement 3D V-cache on both CCD chiplets, giving these processors an impressive 192 MB of L3 cache (96 MB per CCD), and 208 MB or 204 MB of "total cache" (L2+L3). The report also says that AMD is planning a Ryzen 5 9600X3D chip, its second attempt at taking on Intel's Core i5 lineup, following its very recent release of the Ryzen 5 7600X3D, which ended up 1-3% short of the Core i5-14600K in gaming workloads. There's no word on whether the 9600X3D will launch in October alongside the 9800X3D, or in Q1-2025 with the Ryzen 9 9000X3D series.
Documentation indicates that the max L3 cache is 96 MB, although the reference to SOC configuration leaves it unclear whether this is from 3DVCache, placed on top of the CCD, or otherwise.
The introduction of 3D V-cache on both CCDs of the 9950X3D and 9900X3D could be interesting, as both chiplets will be capable of gaming workloads at a uniform performance level. On the 7950X3D and 7900X3D, OS scheduler-level QoS logic ensure gaming workloads are scheduled to the CCD with the 3D V-cache, while multithreaded productivity workloads are allowed to spread across both CCDs.
Source: Benchlife.info
Add your own comment

76 Comments on AMD Ryzen 9 9950X3D and 9900X3D to Feature 3D V-cache on Both CCD Chiplets

#26
JWNoctis
What's the reliability of this info? Is this a last-minute change? Somehow I'm imagining their testing and validation teams working three shifts. :twitch:

None of it does anything for cross-CCD scheduling problems - Ninja'd @RogueSix
Posted on Reply
#27
wNotyarD
persondbDid they get around the clock speed regressions from 3D cache? That was essentially why they haven`t done dual 3D cache,

A lot of applications suffered more from the clock speed regression than the benefits of more cache.
I think AMD did indeed correct it, if they're able to promise overclocking on the new X3D SKUs.
Posted on Reply
#28
AusWolf
RogueSixIt is quite the contrary to what you say. Scheduling will now be even more important to the point it becomes SUPER-DUPER-MEGA-EXTRA-important with cache on both CCDs. For this dual cache setup to work correctly, games/apps always need to request the cached data from the "correct" cache on the "correct" CCD or else you will suffer latencies from hell if/when data needs to be fetched from the cache across the CCDs because e.g. Core 3 requests data that was previously stored to the cache by Core 14 on the other CCD. Can't have a scenario like that. Ever.

So, both the scheduler and the CPU always need to "know" exactly "who" (which core) cached something (what) and where it was cached to avoid the dreaded inter-CCD and inter-cache latencies. This is definitely going to be a challenge and very complex on the level of correct scheduling and correct CCD assignment etc.

AMD does not exactly have the best track record when it comes to these scheduling and core assignment shenanigans so I would be quite surprised if they get this to work flawlessly out of the gate.
Personally, I have avoided multi CCD CPUs like the plague due to the Xbox GameBar and 'GameMode On' requirements (I have a PC and not a console, you muppets). It will be interesting to see if the GameBar requirement will be dropped now(?) since core parking will no longer be required.

We'll have to wait and see how well this is gonna work in practice. I would expect some growing pains, to say the least...
Why is it a challenge, though? Shouldn't it be as simple as assigning CCD 1 to any foreground program that requires 8 cores + 96 MB (edited) cache or less, while background tasks get CCD 2, and anything that needs more than the above is spread out across the two CCDs?
Posted on Reply
#29
usiname
RogueSixIt is quite the contrary to what you say. Scheduling will now be even more important to the point it becomes SUPER-DUPER-MEGA-EXTRA-important with cache on both CCDs. For this dual cache setup to work correctly, games/apps always need to request the cached data from the "correct" cache on the "correct" CCD or else you will suffer latencies from hell if/when data needs to be fetched from the cache across the CCDs because e.g. Core 3 requests data that was previously stored to the cache by Core 14 on the other CCD. Can't have a scenario like that. Ever.

So, both the scheduler and the CPU always need to "know" exactly "who" (which core) cached something (what) and where it was cached to avoid the dreaded inter-CCD and inter-cache latencies. This is definitely going to be a challenge and very complex on the level of correct scheduling and correct CCD assignment etc.

AMD does not exactly have the best track record when it comes to these scheduling and core assignment shenanigans so I would be quite surprised if they get this to work flawlessly out of the gate.
Personally, I have avoided multi CCD CPUs like the plague due to the Xbox GameBar and 'GameMode On' requirements (I have a PC and not a console, you muppets). It will be interesting to see if the GameBar requirement will be dropped now(?) since core parking will no longer be required.

We'll have to wait and see how well this is gonna work in practice. I would expect some growing pains, to say the least...
If the cores can get data from the wrong L3 cache, shouldn't this problem exist now? The L3 is spread to two CCD even now
Posted on Reply
#30
Makaveli
So I guess they worked around the issue of Dual CCD traffic killing gains like it did for the 5900X prototype that had v-cache on both.

So the question that reminds is clock speed affected or did they solve this problem also.
Posted on Reply
#31
RogueSix
usinameIf the cores can get data from the wrong L3 cache, shouldn't this problem exist now? The L3 is spread to two CCD even now
No. Because currently the solution is outright core parking. One CCD (the one w/o the 3D cache) gets put to sleep when you are gaming on a multi CCD X3D CPU.
Posted on Reply
#32
oxrufiioxo
phanbueyI actually think with the inter-CCD latency at ~70ns this isn't far from the truth.
I'd still prefer this setup over the 7950X3D though you can always tie games to the stonger ccd either way and my guess is even when cores do jump ccd it won't be a large hit.... Oddly when I tie games to all 16 cores I see higher avg framerates than I do if I just tie it to the non cache ccd in games that don't behave so this should still be a bit better than that.

I'd still like both options available just becuase I want to see the actual difference not on games that do well on the current 7950X3D already but in games that don't behave without user intervention.

So I hope AMD releases both options with the single ccd option being slightly cheaper for academic reasons of course lol.
Posted on Reply
#33
AnotherReader
MakaveliSo I guess they worked around the issue of Dual CCD traffic killing gains like it did for the 5900X prototype that had v-cache on both.

So the question that reminds is clock speed affected or did they solve this problem also.
I'm not sure if inter die requests were ever a factor. If they were, then EPYC X would suffer much more than a hypothetical 5950X3D with 192 MB of L3 cache.
Posted on Reply
#34
SIGSEGV
DavenMy work is assigning me CAD and rendering work. So I guess I’m upgrading my 7700x to a 9950x3d on my gaming rig. Thread assignment won’t be an issue if both CCDs have 3D cache.

Adobe Dimensions and Solid works needs some fast PC specs. I have a 7900xt so GPU spec is already there.
Indeed, 3D V-Cache was initially intended to be developed for server/workstation environments (for both CCD).
I don't know IF I will sell my 9950x
Let's wait for it on January 2025 :laugh:
Posted on Reply
#35
Darmok N Jalad
Old enough to remember when this amount of L3 was a LOT of system RAM to have--more than the PC I had in college about 25 years ago. These CPUs could theoretically run NT4 and all associated programs without any system RAM, provided running RAMless was even be possible. Just think of the FPS and load times I'd get in Quake II. :D
Posted on Reply
#36
phanbuey
Darmok N JaladOld enough to remember when this amount of L3 was a LOT of system RAM to have--more than the PC I had in college about 25 years ago. These CPUs could theoretically run NT4 and all associated programs without any system RAM, provided running RAMless was even be possible. Just think of the FPS and load times I'd get in Quake II. :D
500hz true gaming.
Posted on Reply
#37
Darmok N Jalad
phanbuey500hz true gaming.
How'd you know? It was a great day when I went from a P166MMX to the K6-2 500!
Posted on Reply
#38
AusWolf
AnotherReaderI'm not sure if inter die requests were ever a factor. If they were, then EPYC X would suffer much more than a hypothetical 5950X3D with 192 MB of L3 cache.
I always said that if inter-CCD communication is an issue, then you don't need more than 8-cores. And if you do, then the benefits of having more cores outweigh the detriment of inter-CCD latency anyway. :)
Posted on Reply
#39
A Computer Guy
Yea but the glaring disappointment might ensue because are we still getting 1 Good CCD and 1 Meh CCD? I suppose core uniformity now is a big plus and the lower TDP required by X3D cache.
Posted on Reply
#40
Octavean
Since I went with the RyZen 3950X and later the 7950X (when released) due to wanting / needing more cores I was willing to forgo the benefits of X3D. I should point out that I didn't want to deal with core parking. The 9950X3D seems like a no compromise part and will likely be priced aggressively. I suspect that the clock speeds will be slightly reduced and the stacking usually means slightly less heat tolerance but this is minor. Still probably wont be worth it to me to upgrade to this gen but AMD is setting a precedent that I can get behind. Next gen I'll likely wait for the 16 core 32 thread X3D offering, the successor to the 9950X3D,....
Posted on Reply
#41
TumbleGeorge
More cache more misses. Not only more hits. :D.
Posted on Reply
#42
A Computer Guy
TumbleGeorgeMore cache more misses. Not only more hits. :D.
In either case 100% hits are still being taken to the wallet or the mattress depending on where you are from. Prepare to fork out the cash for Dual X3D is my prediction.
Posted on Reply
#43
ir_cow
I know what Im buying next :)
Posted on Reply
#44
rv8000
It’s probably just me, but I see this as a loss if X3D parts are frequency limited again. It would’ve been more beneficial if they had implemented some sort of hardware scheduler as opposed to just dropping in another 3D cache chiplet.
Posted on Reply
#45
dgianstefani
TPU Proofreader
rv8000It’s probably just me, but I see this as a loss if X3D parts are frequency limited again. It would’ve been more beneficial if they had implemented some sort of hardware scheduler as opposed to just dropping in another 3D cache chiplet.
Zen 6.
Posted on Reply
#46
Carillon
Consoles have similar latency issues with their double CCX design, I'm sure that ways to mitigate this latency penalty already exist there.
If we are lucky, we could see them come to PC in 5 to 10 years.
Posted on Reply
#48
AusWolf
rv8000It’s probably just me, but I see this as a loss if X3D parts are frequency limited again. It would’ve been more beneficial if they had implemented some sort of hardware scheduler as opposed to just dropping in another 3D cache chiplet.
It's the usual "X3D if you game, normal if you don't" narrative again. Personally, I don't mind. Those higher clocks don't give you that much more performance anyway - only more power consumed and heat.
Posted on Reply
#49
Hodor
Darmok N JaladOld enough to remember when this amount of L3 was a LOT of system RAM to have--more than the PC I had in college about 25 years ago. These CPUs could theoretically run NT4 and all associated programs without any system RAM, provided running RAMless was even be possible. Just think of the FPS and load times I'd get in Quake II. :D
My 7950x3d gets 940fps on crusher.dm2
Posted on Reply
#50
igormp
The only benefit I see about this for me is if the 9950x pricing drops even more.
That extra cache is basically useless for most of what I do, not worth the extra price and (likely) reduced clock speeds.
ChaitanyaThere are plenty tasks besides gaming that will take advantage of that victim cache. Also now that both dies are getting stacked cache, it should get rid of problem arising from assymetric cores.
www.phoronix.com/review/amd-5800x3d-linux/3
The benefits are mostly for HPC and CFD workloads. And for those workloads, why would one even be using a Ryzen CPU? The dual-channel setup is already going to kill your performance anyway since those workloads are really memory bound.
For all others, it's actually a regression in performance due to the lower clocks. Here's the updated graph for Zen 4:

www.phoronix.com/review/amd-ryzen-7-7800x3d-linux/9
openbenchmarking.org/result/2304049-PTS-APRIL20258&sgm=1&hgv=Ryzen+7+7800X3D&sor
RogueSixIt is quite the contrary to what you say. Scheduling will now be even more important to the point it becomes SUPER-DUPER-MEGA-EXTRA-important with cache on both CCDs. For this dual cache setup to work correctly, games/apps (via the scheduler) always need to request the cached data from the "correct" cache on the "correct" CCD or else you will suffer latencies from hell if/when data needs to be fetched from the cache across the CCDs because e.g. Core 3 requests data that was previously stored to the cache by Core 14 on the other CCD. Can't have a scenario like that. Ever.

So, both the scheduler and the CPU always need to "know" exactly "who" (which core) cached something (what) and where it was cached to avoid the dreaded inter-CCD and inter-cache latencies. This is definitely going to be a challenge and very complex on the level of correct scheduling and correct CCD assignment etc.

AMD does not exactly have the best track record when it comes to these scheduling and core assignment shenanigans so I would be quite surprised if they get this to work flawlessly out of the gate.
Personally, I have avoided multi CCD CPUs like the plague due to the Xbox GameBar and 'GameMode On' requirements (I have a PC and not a console, you muppets). It will be interesting to see if the GameBar requirement will be dropped now(?) since core parking will no longer be required.

We'll have to wait and see how well this is gonna work in practice. I would expect some growing pains, to say the least...
I mean, that's mostly a Windows scheduler issue. Even Intel had to develop their own HW scheduler to work around that (which is pretty useless on Linux, as an example).
AnotherReaderI'm not sure if inter die requests were ever a factor. If they were, then EPYC X would suffer much more than a hypothetical 5950X3D with 192 MB of L3 cache.
Epyc X is not really used for games, and workloads that can be embarassingly parallel (like the ones Epyc X are used for) should not really do much cross-core communication at all to begin with.
rv8000It’s probably just me, but I see this as a loss if X3D parts are frequency limited again. It would’ve been more beneficial if they had implemented some sort of hardware scheduler as opposed to just dropping in another 3D cache chiplet.
I don't think AMD would put so much effort in order to just work around Window's limitations. They haven't even updated Ryzen's desktop IO Die in a log time.
Maybe on Zen 6, as said before, but who knows.
ThomasKThere's pleny of benefits to be had from the increased L3 cache on all CCDs on consumer side, let alone on the data center, where it's already common.

Cloudflare switches to EPYC 9684X Genoa-X CPUs with 3D V-Cache — 145% faster than previous-gen Milan servers | Tom's Hardware (tomshardware.com)
As I said above, it's mostly only for CFD and HPC stuff (apart from games). Most consumers will get no benefit from such extra cache.
Posted on Reply
Add your own comment
Sep 27th, 2024 21:22 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts