Thursday, December 26th 2024

AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP

Recent engineering samples of AMD's upcoming Ryzen 9 9950X3D reveal what appear to be the finalized specifications of the top-tier AM5 chip. The 16-core, 32-thread processor builds upon the gaming success of the Ryzen 7 9800X3D while addressing its core count limitations. The flagship processor features AMD's refined cache design, combining 96 MB of 3D V-Cache with 32 MB of standard L3 cache. Unlike its predecessor, the 7950X3D, the new Zen 5 architecture incorporates a redesigned CCD stacking method. The CCD now sits above the cache, directly interfacing with the STIM and IHS, eliminating thermal constraints that previously required frequency limitations. The processor features asymmetric cache distribution across its dual CCDs—one die combines 32 MB of base L3 cache with a 64 MB stacked V-Cache layer, while its companion die utilizes a standard 32 MB L3 cache configuration. In total, there is a 128 MB of L3 cache, with 16 MB of L2.

This architectural advancement enables the 9950X3D to achieve a 5.65 GHz boost clock across both CCDs, matching non-X3D variants. The processor maintains a 170 W TDP, suggesting improved thermal efficiency despite the additional cache. AMD's software-based OS scheduler will continue to optimize gaming workloads by directing them to the CCD with 3D V-Cache. Early leaks indicate the 9950X3D matches the base 9950X in Cinebench R23 scores, both in single and multi-threaded tests—a significant improvement over the 7950X3D, which lagged behind its non-X3D counterpart due to frequency limitations. AMD plans to expand the Zen 5 X3D lineup in Q1-2025 with both the 9950X3D and 9900X3D models. Full performance benchmarks and pricing details are expected at CES 2025, where AMD will officially unveil these processors alongside their RDNA 4 GPUs.
Sources: @94G8LA, via VideoCardz
Add your own comment

91 Comments on AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP

#51
JustBenching
AusWolfPersonally, I wouldn't bother with dual CCD X3D CPUs at all. Single CCD X3D is great for gamers, dual CCD non-X3D is good for professionals (and for a bit of gaming, too), but the overlap between these two types of people is way too thin.
It's not about the overlap, it's the fact that dual 3d cache chip would be the best. At everything. A lot of enthusiasts (myself included) would just buy it, no reason not to. How much impact will the dual cache have on the price, 50 - 100$? Problem right now is windows, these chips really need a hardware thread director like intel. That would be more interesting than a dual 3d cache chip to be fair.
Posted on Reply
#52
Wirko
Legacy-ZAI agree, but I would like it if they can give us 12 Cores on a single CCD already. Perhaps with the new Node shrink coming up for the AMD Ryzen 10 000? That would definitely make me upgrade again. :D
Write a nice letter to Santa (95054 Santa Clara, CA). It's not too late, it will take two years to come true anyway.

I mean, expecting a 10950HS chip in late 2025 and 10900G in late 2026 would be somewhat realistic. Not earlier. APUs are always for notebooks and mini PCs primarily, AMD struggles to rent enough capacity at TSMC, so desktop versions come much later.
Posted on Reply
#53
AusWolf
Legacy-ZAI agree, but I would like it if they can give us 12 Cores on a single CCD already. Perhaps with the new Node shrink coming up for the AMD Ryzen 10 000? That would definitely make me upgrade again. :D
Oh yeah, that would make my juices flow! :)

With an awesome 7800X3D in my system, I can't be asked to spend a penny on yet another 8-core CPU.
Posted on Reply
#54
Panther_Seraphin
Legacy-ZAI agree, but I would like it if they can give us 12 Cores on a single CCD already. Perhaps with the new Node shrink coming up for the AMD Ryzen 10 000? That would definitely make me upgrade again. :D
From early leaks from HeWhoShallNotBeNamed
Supposedly Zen 6 is a 12 core CCD design. However I think it will require a big rework on the IO die and a big increase of IF bandwidth to be able to keep the CCDs fed with data as that is up to 24 threads per CCD.

I wonder if we will see AMD use the stacking technology and be able to overlay parts of the CCDs onto the IO Die to save physical space on the Substrate AND increase the IF bandwidth and speed?
Posted on Reply
#55
dismuter
Big disappointment that they still don't give 3D cache to both dies.
Posted on Reply
#56
BlaezaLite
Surprised the v-cache isn't on both CCD's, but lets wait and see those numbers!
Posted on Reply
#57
mb194dc
It's a super niche use case chip anyway. There's the older 16 core and 3d chips which are going to be just as good for pretty much anything.

Nearly everyone who's invested thousands in a 4090 and a top AM5 rig will be using 1440p at minimum. In which case gaming increases will be minimal as on 9800x3d.

Similar for productivity, few use cases do enough intensive crunching that they'll even notice the difference between the last couple of generations. In the course of your day, saving 20 seconds on a compile or encode don't mean jack.
Posted on Reply
#58
A Computer Guy
JustBenchingIt's not about the overlap, it's the fact that dual 3d cache chip would be the best. At everything. A lot of enthusiasts (myself included) would just buy it, no reason not to. How much impact will the dual cache have on the price, 50 - 100$? Problem right now is windows, these chips really need a hardware thread director like intel. That would be more interesting than a dual 3d cache chip to be fair.
What they really need is OS support so programmers can manually assign threads to the CCD of choice. Each CCD in the non-x3d cache version already had a massive cache and dividing workloads among CCD's can be a significant boost at the application level when the application doesn't need all cores.
mb194dcSimilar for productivity, few use cases do enough intensive crunching that they'll even notice the difference between the last couple of generations. In the course of your day, saving 20 seconds on a compile or encode don't mean jack.
Saving 20 seconds on compile is another 20 seconds to sip coffee. (No joke.)
Posted on Reply
#59
Wirko
A Computer GuyWhat they really need is OS support so programmers can manually assign threads to the CCD of choice.
I agree, this might be one of the better among many possible solutions. But isn't it already implemented in some games?
A Computer GuySaving 20 seconds on compile is another 20 seconds to sip coffee. (No joke.)
Ugh! You should improve your multitasking skills if you can't drink coffee and wait for something on the PC to finish at the same time.
Panther_SeraphinI wonder if we will see AMD use the stacking technology and be able to overlay parts of the CCDs onto the IO Die to save physical space on the Substrate AND increase the IF bandwidth and speed?
No, there's no need to save more space this way, and the IOD is a hot component too. Also, partial overlap would be complex as hell to realise because you'd have to bond the top chip on the bottom chip AFTER the bottom chip has been bonded to the substrate.

For this purpose, I think it would be best to put the IOD and CCDs next to each other, with no space between, and connect them with LSI (which is TSMC's take on EMIB).
Posted on Reply
#60
freeagent
This chip is for gamers that do stuff with their computers outside of games and forums lol.

8 cores vs 12 or 16 is pretty weak. My 5900X looks down at my 58X3D in everything except some games.
Posted on Reply
#61
Panther_Seraphin
And having 2 X3D equipped dies would perform on par or worse than a single X3D CCD in games due to the penalty of having to traverse the IF to get data out of the other CCDs cache.

IMHO Multi X3D dies only benefit very niche workloads such as databases and productivity (I would suspect video encoding etc may get a slight uplift) but for the majority of people the single X3D die when utilised properly will be the best solution till IF gets a major rework/upgrade or move to a higher density CCD design.
Posted on Reply
#62
A Computer Guy
Panther_SeraphinAnd having 2 X3D equipped dies would perform on par or worse than a single X3D CCD in games due to the penalty of having to traverse the IF to get data out of the other CCDs cache.
Not if games were optimized for dual CCD operation. I suspect that will never happen even though AMD became a software company.
Posted on Reply
#63
freeagent
Just curious if this was a money thing or what. If they bury the cache, heat and inter ccd latency should be cut down a lot, if not become nearly a thing of the past, unless I misunderstood.
Posted on Reply
#64
dragontamer5788
dgianstefaniFYI Zen 5 performs better in games with SMT off, so I wouldn't be so quick to criticise Intel for being ahead of the curve.
SMT isn't magic. SMT works by splitting resources between the two threads.

Primarily, the caches. As we all know, games love cache so I'm not surprised that splitting the cache has a detrimental effect.

But even if a thread isn't loaded, the register files, reorder buffers, decoders and branch predictors remain shared. So SMT will have slightly worse single threaded performance.

SMT is ideal when a 5%ish drop in single threaded is an acceptable tradeoff for +40% multi threaded performance. Games do not work like this.

Intel has changed their design to P-Cores which specialize in singlethread, and E-cores which specialize in multi thread. But this seems like a poor strategy to me for other reasons....
Panther_SeraphinAnd having 2 X3D equipped dies would perform on par or worse than a single X3D CCD in games due to the penalty of having to traverse the IF to get data out of the other CCDs cache.

IMHO Multi X3D dies only benefit very niche workloads such as databases and productivity (I would suspect video encoding etc may get a slight uplift) but for the majority of people the single X3D die when utilised properly will be the best solution till IF gets a major rework/upgrade or move to a higher density CCD design.
I expect that video is bad for x3d.

The name of the game is fitting in the cache. Video games have lots of stuff that is larger than 32MB but less than 96MB, and the CPU automatically discovers the hot data to share.

Video is not like that. You watch (or encode) one frame and then move into the next one. Nothing will fit in cache. Or at least, nothing extra really fits in the 33rd MB that's worthwhile.

Video and 3D modeling (Blender) usually prefer more cores... While dealing with so much data that the caches are blown over and useless.
Posted on Reply
#65
evernessince
Visible NoiseSerious question, why should even “enthusiasts” care? Do you buy your CPUs by the mm?
I mean you kind of do. AMD and Intel pay per wafer and that cost is only going up and passed onto customers. You are paying for the space used by the cores and the rest of the CPU. Space efficiency has always been a key metric of semiconductor design and high efficiency in that regard portends to advancements in how much you can cram in a given area, efficiency, ect.
Posted on Reply
#66
cristi_io
It would be interesting a CCD made of 12 Zen 5c core with a X3d cache.
A 12 core Zen 5c would be the same size of 8 core Zen 5, but even with less level 3 cache, the 64MB x3d cache would make up, so no real disadvantage.
Posted on Reply
#67
dgianstefani
TPU Proofreader
cristi_ioIt would be interesting a CCD made of 12 Zen 5c core with a X3d cache.
A 12 core Zen 5c would be the same size of 8 core Zen 5, but even with less level 3 cache, the 64MB x3d cache would make up, so no real disadvantage.
Can't happen because the through vias were removed. You think the Zen 5c cores are smaller without anything removed?
Posted on Reply
#68
dragontamer5788
cristi_ioIt would be interesting a CCD made of 12 Zen 5c core with a X3d cache.
A 12 core Zen 5c would be the same size of 8 core Zen 5, but even with less level 3 cache, the 64MB x3d cache would make up, so no real disadvantage.
That was basically called SkylakeX and Xeon Platinum 10 years ago.

AMD explicitly has been going chiplets with all of its associated downsides. Intel is actually going chiplets today and I'd expect more chiplets in the future.

3nm and 2nm are too expensive to run monolithic designs anymore. You need to cut up the dies and split them off to increase the yields and efficiency of manufacturing.

Expect more 'split computer' designs out of AMD, Intel, and even NVidia, moving forward.
Posted on Reply
#69
Wirko
dragontamer5788Expect more 'split computer' designs out of AMD, Intel, and even NVidia, moving forward.
More, sure, but not all. I believe AMD will still make monolithic APUs on N3. A defect or two doesn't render the chip useless if it's located in one of the "redundant" blocks (that is, CPU or GPU cores), which make up at least 2/3 of the chip's area. Yields are also expected to improve over time.
Posted on Reply
#70
lexluthermiester
AusWolfWho is the 9950X3D made for exactly?
As made, no one. The 3D cache needs to be evenly divided between CCX's or equally accessable so that performance doesn't suffer. This divided distribution thing they have going on is unacceptable and the 9900X/9950X are the better choice for the balanced performance.
Posted on Reply
#71
kapone32
lexluthermiesterAs made, no one. The 3D cache needs to be evenly divided between CCX's or equally accessable so that performance doesn't suffer. This divided distribution thing they have going on is unacceptable and the 9900X/9950X are the better choice for the balanced performance.
You mean like the 7900X3D? I guess 20 nanoseconds is tantamount to failure.
Posted on Reply
#72
lexluthermiester
kapone32You mean like the 7900X3D?
Any of the multi CCX X3D models. For example, the 5600X3D/5800X3D/7600X3D/7800X3D/9600X3D/9800X3D do not suffer from cache latency issues.
kapone32I guess 20 nanoseconds is tantamount to failure.
That is just a wee bit off. The latency is significant enough that inter-CCX exchanges are a problem for optimal performance.
Posted on Reply
#73
kapone32
lexluthermiesterAny of the multi CCX X3D models. For example, the 5600X3D/5800X3D/7600X3D/7800X3D/9600X3D/9800X3D do not suffer from cache latency issues.

That is just a wee bit off. The latency is significant enough that inter-CCX exchanges are a problem for optimal performance.
So what is it 40 nanoseconds?
Posted on Reply
#74
lexluthermiester
kapone32So what is it 40 nanoseconds?
Add a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.
Posted on Reply
#75
AnotherReader
lexluthermiesterAdd a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

Posted on Reply
Add your own comment
Dec 28th, 2024 11:56 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts