AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP

AnotherReader · 2024-12-28T04:55:17+0000

lexluthermiester said:
Add a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.

If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

uplink777 · 2024-12-28T05:51:46+0000

Fun fact. Just spent nearly a year on 7950X3D, now I'm on 9800X3D. Can't help it, but for general work, Office 365, Teams, Skype, JiRA in Chrome, Adobe Creative Cloud PS, Ai, Audition, After Effects, Media Encoder, Figma and gaming Drova, Space Marine II and Cyberbug 2077 I have better experience on lower core 9800X3D compared to higher core prev. gen SKU ‍

Buddha666 · 2024-12-28T06:19:58+0000

AusWolf said:
I'm not saying that these CPUs aren't great, just that they're kind of pointless. Gamers have the 9800X3D. Professionals have the 9950X. Who is the 9950X3D made for exactly? Professionals who also need the last drop of FPS while they're gaming? C'mon...

Exactly, Im one of them

AusWolf · 2024-12-28T06:36:02+0000

uplink777 said:
Fun fact. Just spent nearly a year on 7950X3D, now I'm on 9800X3D. Can't help it, but for general work, Office 365, Teams, Skype, JiRA in Chrome, Adobe Creative Cloud PS, Ai, Audition, After Effects, Media Encoder, Figma and gaming Drova, Space Marine II and Cyberbug 2077 I have better experience on lower core 9800X3D compared to higher core prev. gen SKU ‍

That's because the 9800 has faster cores, which is exactly what you need for games and your type of work. Some work needs more cores, but apparently not the one you're doing, which is fine.

inquisitor1 · 2024-12-28T07:00:16+0000

Cheeseball said:
@AleksandarK

I probably won't upgrade to this one until after 1 or 2 years. If this was like a 30% performance jump (its not, comparing 7800X3D to 9800X3D), then sure, but not $700 sure.

thats what i always do. wait for prices to drop. I just got the 5950x. so will probably get this 9950x/3d in 3-4 for $315

still will be a kick asz cpu a few years later. I believe in jumping AT LEAST 2 gens to get the best bang for buck

dragontamer5788 · 2024-12-28T07:00:22+0000

AnotherReader said:
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

View attachment 377444

I believed you, but it took me a while to verify it with a link.

AMD AGESA 1.2.0.2 Update Fixes Ryzen 9000 Series Inter-Core Latency Issues

According to new latest testing, the latest AGESA (AMD Generic Encapsulated Software Architecture) update, version 1.2.0.2, promises a significant boost in performance for AMD Ryzen 9000 "Zen 5" processors. This update is targeting one of the most crucial aspects of multi-core processing...

www.techpowerup.com

I'm finalizing my 5-year-upgrade build, and am planning to go to Microcenter for the Ryzen 9 9900x build soon. So getting 100% proof of this latency issue being fixed was a big priority for my 9900x vs 9800x3d decision.

80ns is within the realm of P-core to P-core on the Intel Ultra 7 265k. I do think the Intel Ultra 7 is underrated but I'm too much of an AVX512 fanboy so Zen5 wins me over.

Chips-and-cheese core-to-core latency graphs of Arrow Lake: https://chipsandcheese.com/p/examining-intels-arrow-lake-at-the

kapone32 · 2024-12-28T07:12:14+0000

AnotherReader said:
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

View attachment 377444

Even if it was 200 nanoseconds how much would that be in real time

lexluthermiester · 2024-12-28T07:12:46+0000

AnotherReader said:
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

View attachment 377444

Those are core to core latency numbers on that graph. There are processing latency penalties for core to inter-ccx-cache transfers and requests.

dragontamer5788 · 2024-12-28T07:14:51+0000

lexluthermiester said:
Those are core to core latency numbers on that graph. There are processing latency penalties for core to inter-ccx-cache transfers and requests.

You missed the update. Its under 80ns now as widely reported by many reliable tech discussion sites.

I linked TechPowerup earlier, but here's Chips and Cheese's tests as well:

You were correct at launch. The issue is that AMD released new microcode recently that fixed the 200-to-400 nanosecond latencies and pushed it all the way down to 80 or less.

lexluthermiester · 2024-12-28T07:55:19+0000

dragontamer5788 said:
You missed the update. Its under 80ns now as widely reported by many reliable tech discussion sites.

I linked TechPowerup earlier, but here's Chips and Cheese's tests as well:

View attachment 377454

You were correct at launch. The issue is that AMD released new microcode recently that fixed the 200-to-400 nanosecond latencies and pushed it all the way down to 80 or less.

Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.

ncrs · 2024-12-28T09:30:45+0000

lexluthermiester said:
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.

The additional L3 latency of X3D for 7950X3D vs 7950X is 1.61 ns as tested by Chips and Cheese which for the capacity increase is a terrific achievement. ~~For Zen 5 it is most likely improved further:~~

It's not like AMD hasn't thought about doing X3D on more than one chiplet, in fact they are selling EPYCs like 9684X with 12 CCDs and 1152MB L3.

Edit: Zen 5 X3D latency penalty is about the same:

SunMaster · 2024-12-28T10:23:25+0000

evernessince said:
It's not relevant to consumers, it's relevant to enthusiasts. I assume you are not the latter. If you are not interested in the topic, you can kindly avoid butting in. Thank you

I think you should contact Intel and, judging by your enthusiast grade expertize, tell them to only manufacture little cores from now on.

Panther_Seraphin · 2024-12-28T15:21:21+0000

A Computer Guy said:
Not if games were optimized for dual CCD operation. I suspect that will never happen even though AMD became a software company.

The problem with most if not all games is there is a "master" thread that basically everything interacts with so no matter what the game engine is doing it will always have an interaction with this master thread on the regular to keep everything timed correctly. This is half the problem with scaling out games to utilise more cores effectively as no matter what, you are still dependant on the main thread to tie everything back together.

Databases etc can have completely seperate threads that do no interact with the master except at point of creation and completion so they miss all the "regular" penalties of inter CCD communication.

unwind-protect · 2024-12-28T15:27:10+0000

evernessince said:
Read the article, the CPU boosts to top clocks on both CCDs this time around. X3D no longer limits frequency or heat.

We only know that the max turbo frequency is the same for both CCDs.

That doesn't necessarily say that they spend the same time at that speed under all conditions.

Anyway, I want one.

kapone32 · 2024-12-28T15:29:25+0000

lexluthermiester said:
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.

Show me the numbers to support your argument.

Panther_Seraphin said:
The problem with most if not all games is there is a "master" thread that basically everything interacts with so no matter what the game engine is doing it will always have an interaction with this master thread on the regular to keep everything timed correctly. This is half the problem with scaling out games to utilise more cores effectively as no matter what, you are still dependant on the main thread to tie everything back together.

Databases etc can have completely seperate threads that do no interact with the master except at point of creation and completion so they miss all the "regular" penalties of inter CCD communication.

See City Skylines 2 and Space Marine 2.

dragontamer5788 · 2024-12-28T15:34:50+0000

lexluthermiester said:
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.

200ns is 5MHz and slower latencies than that I can literally measure with my hobby-grade oscilloscope and Arduino-like AVR microcontrollers. Such latencies are possible on server-class systems where more chips and dies are in play but I'd be surprised to see it on a simpler desktop.

Die-to-die latencies do exist of course but on a scale far smaller than you might imagine. 200ns+ is server-grade equipment latencies, not something I'd expect to see on a desktop system. And that's because server-grade systems have more RAM, more RAM Controllers, more dies and more caches that need to communicate. So everything slows down.

---------

Anyway, 200ns latencies for an on-package SRAM makes no sense. That's slower than DRAM (!!!!) like DDR5 technologies. SRAM always had much smaller latencies than that, and I expect that the x3d caches are made out of the faster SRAM and not the slower DRAM. (also: logic companies like AMD/TSMC can make SRAM more easily than DRAM. DRAM is actually very difficult to make on these processes)

evernessince · 2024-12-28T16:40:19+0000

unwind-protect said:
We only know that the max turbo frequency is the same for both CCDs.

That doesn't necessarily say that they spend the same time at that speed under all conditions.

Anyway, I want one.

Non-X3D and X3Ds now perform about the same thermally

View attachment 1735404025885.webp

Knowing that frequency and thermals are similar, the probability that boosting characteristics will be different enough to make a notable difference in games in near zero.

Wirko · 2024-12-28T17:21:17+0000

lexluthermiester said:
Add a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.

Which review arrived at these results?

lexluthermiester · 2024-12-28T17:49:53+0000

ncrs said:
The additional L3 latency of X3D for 7950X3D vs 7950X is 1.61 ns as tested by Chips and Cheese which for the capacity increase is a terrific achievement. ~~For Zen 5 it is most likely improved further:~~

It's not like AMD hasn't thought about doing X3D on more than one chiplet, in fact they are selling EPYCs like 9684X with 12 CCDs and 1152MB L3.

Edit: Zen 5 X3D latency penalty is about the same:

Once again, context is important. That data is for the 9800X3D...

People, learn how to context.

ncrs · 2024-12-28T18:02:17+0000

lexluthermiester said:
Once again, context is important. That data is for the 9800X3D...

People, learn how to context.

Still no sources for your claims? I've provided professional measurements for two generations of X3D CPUs with both topologies and neither comes even close to what you suggested the latency impact is.
What is more your 550ns figure is over twice the time that one EPYC Turin core takes to communicate with a core in another socket. Please explain how you arrived at this figure, and what is the "context" here.

Edit: seems that the context here is trolling, but it's OK - I've refreshed my knowledge a bit by researching this.
For completeness here's the L3 latency plot for an EPYC Milan-X with 8 X3D CCDs where going to another X3D slice has a penalty but overall keeps below DRAM latency:

ValenOne · 2024-12-28T22:07:52+0000

Vincero said:
By that same token, even the single die CCD X3D parts are actually not great - outside of gaming and cache limited / memory bandwidth / thread limited scenarios the higher boosting higher TDP normal CCD CPUs win in average productivity.

Anyway, as for the dual X3D part.... Someone will always pay for it even if the benefit is only 1% (or even less) over the next competitive product in the lineup, witness the 14900KS...

9800X3D is faster than 9700X in Blender.

Panther_Seraphin · 2024-12-28T22:20:02+0000

For the people who keep quoting Latency graphs to argue there is no penalty, consider this. Those tests you are quoting are 1 core accessing 1 core on the 2nd CCD, I have linked a test that goes into further details where they load down CCDS from single thread to fully loaded and measured its latency and in actually splitting threads across the two CCDs and with Zen 4 it is bad!!!

Pushing AMD’s Infinity Fabric to its Limits

I recently wrote code to test memory latency under load, seeking to reproduce data in various presentations with bandwidth on the X axis and latency on the Y axis.

chipsandcheese.com

Zen 4 has a hardware limitation that a dual X3D setup would have been absolutly HORRENDOUS in performance as accessing the 2nd CCDs cache would have been only as fast as accessing DRAM in certain worst case scenarios and can very easily see 2-3 times the latency penalty rising to nearly 10 times in the worst case. I suspect Zen 3/5xxx series parts would have seen similar issues due to the design of the IO Die etc

Zen 5 has seemingly fixed this issue as well as having the high clock speeds due to the relocated X3D. I wonder if we AMD are holding back dual X3D parts in case Intel pulls something out of the bag ala Nvidias origianl Ti/Super variants of a few years ago? I mean the Single CCD parts are completely handing Intel the L in gaming by quite a margin currently.

Also are they trying to prevent confusion as the dual x3d parts would segregate the market even futher again as you now have 3 different SKUs for each core count and with desktop parts probably pushing up towards the $/£1k mark again for the top end non HEDT part. How much would it cut into their lower end HEDT/Workstation sales.

ValenOne · 2024-12-28T22:20:26+0000

dragontamer5788 said:
SMT isn't magic. SMT works by splitting resources between the two threads.

Primarily, the caches. As we all know, games love cache so I'm not surprised that splitting the cache has a detrimental effect.

But even if a thread isn't loaded, the register files, reorder buffers, decoders and branch predictors remain shared. So SMT will have slightly worse single threaded performance.

SMT is ideal when a 5%ish drop in single threaded is an acceptable tradeoff for +40% multi threaded performance. Games do not work like this.

Intel has changed their design to P-Cores which specialize in singlethread, and E-cores which specialize in multi thread. But this seems like a poor strategy to me for other reasons....

I expect that video is bad for x3d.

The name of the game is fitting in the cache. Video games have lots of stuff that is larger than 32MB but less than 96MB, and the CPU automatically discovers the hot data to share.

Video is not like that. You watch (or encode) one frame and then move into the next one. Nothing will fit in cache. Or at least, nothing extra really fits in the 33rd MB that's worthwhile.

Video and 3D modeling (Blender) usually prefer more cores... While dealing with so much data that the caches are blown over and useless.

Zen 5's 8 decoders are bottlenecked by Zen 4 era I/O die.

For Blender, 9800X3D beats 9700X, and nearly rivals 16 cores 5950X and 12 cores 7900. 9800X3D's SMT is strong relative to Zen 3's SMT.

Random_User · 2024-12-29T06:20:18+0000

sephiroth117 said:
It's not as simple, there are limitations still on 3D CCD, even if the 2nd gen X3D are much better, heat less etc, they still want one CCD for fast Ghz and the other for gaming/3D applications

There's a reason, also with dual CCD, would having 32+32MB be as good as one CCD with 64MB 3D extra L3 cache ? if no, wouldn't 64+64 be too expensive ?

I think there are genuine cost and technological obstacles for dual CCD, it's not just them wanting to add a software director and more complexity, maybe further down the line

This is that simple. These are binned and fused EPYC dies anywas. The waste.
Also, this isn’t second gen X3D, but third. And AMD had the fully production ready sample of dual 3DCCD 5950X3D, back in the day, when their 3D-VCache only emerged. There were other reasons.

The Zen4 was perfectly scalable, at any wattage/power/thermal envelope. Zen5 X3D, seems to be as good. There's no frequency limits for it, and it works as fast as non-X3D parts.
At this point, non-3D parts have become, the "dietetic", budget oriented/cut-down version. And AMD themselves have created this image.
And there's absolutely no exuse, for 9950X3D to not be dual 3D-CCD. The technology allows this, the cost is already high, and the 3D dies are now not limited either by frequency, or power.

Just my thoughts!

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	Nebulon B
Processor	AMD Ryzen 7 7800X3D
Motherboard	MSi PRO B650M-A WiFi
Cooling	be quiet! Dark Rock 4
Memory	2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s)	AMD Radeon RX 6750 XT 12 GB
Storage	2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s)	Dell S3422DWG, 7" Waveshare touchscreen
Case	Kolink Citadel Mesh black
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	Seasonic Prime GX-750
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE

System Name	Best AMD Computer
Processor	AMD 7900X3D
Motherboard	Asus X670E E Strix
Cooling	In Win SR36
Memory	GSKILL DDR5 32GB 5200 30
Video Card(s)	Sapphire Pulse 7900XT (Watercooled)
Storage	Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s)	GIGABYTE FV43U
Case	Corsair 7000D Airflow
Audio Device(s)	Corsair Void Pro, Logitch Z523 5.1
Power Supply	Deepcool 1000M
Mouse	Logitech g7 gaming mouse
Keyboard	Logitech G510
Software	Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores	Firestrike: 46183 Time Spy: 25121

System Name	SunMaster special
Processor	5950x
Motherboard	Gigabyte X570 Aorus Master
Cooling	Arctic Cooling Liquid Freezer 420 AIO
Memory	4x16GB@3800
Video Card(s)	Nvidia 970
Storage	WD Black SN850 512GB
Display(s)	2x Philips BDM3270
Case	Fractal Meshify 2 XL
Power Supply	EVGA Supernova GA 850
Keyboard	Corsair K95
Software	Windows 11

Processor	AMD 7600x
Motherboard	Asrock x670e Steel Legend
Cooling	Silver Arrow Extreme IBe Rev B with 2x 120 Gentle Typhoons
Memory	4x16Gb Patriot Viper Non RGB @ 6000 30-36-36-36-40
Video Card(s)	XFX 6950XT MERC 319
Storage	2x Crucial P5 Plus 1Tb NVME
Display(s)	3x Dell Ultrasharp U2414h
Case	Coolermaster Stacker 832
Power Supply	Thermaltake Toughpower PF3 850 watt
Mouse	Logitech G502 (OG)
Keyboard	Logitech G512

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Eula
Processor	AMD Ryzen 9 7900X PBO
Motherboard	ASUS TUF Gaming X670E Plus Wifi
Cooling	Corsair H150i Elite LCD XT White
Memory	Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s)	Gigabyte GeForce RTX 4080 GAMING OC
Storage	Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s)	Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case	Phanteks Eclipse P500A D-RGB White
Audio Device(s)	Creative Sound Blaster Z
Power Supply	Corsair HX1000 Platinum 1000W
Mouse	SteelSeries Prime Pro Gaming Mouse
Keyboard	SteelSeries Apex 5
Software	MS Windows 11 Pro

System Name	Very old, but all I've got ®
Processor	So old, you don't wanna know... Really!

AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP

New Member