• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen 9 9950X3D Carries 3D V-Cache on a Single CCD, 5.6 GHz Clock Speed, and 170 Watt TDP

Joined
Nov 26, 2021
Messages
1,707 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Add a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

1735361737302.png
 
Joined
Aug 25, 2015
Messages
26 (0.01/day)
Fun fact. Just spent nearly a year on 7950X3D, now I'm on 9800X3D. Can't help it, but for general work, Office 365, Teams, Skype, JiRA in Chrome, Adobe Creative Cloud PS, Ai, Audition, After Effects, Media Encoder, Figma and gaming Drova, Space Marine II and Cyberbug 2077 I have better experience on lower core 9800X3D compared to higher core prev. gen SKU ‍♂️
 
Joined
Jan 14, 2019
Messages
12,658 (5.82/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
Fun fact. Just spent nearly a year on 7950X3D, now I'm on 9800X3D. Can't help it, but for general work, Office 365, Teams, Skype, JiRA in Chrome, Adobe Creative Cloud PS, Ai, Audition, After Effects, Media Encoder, Figma and gaming Drova, Space Marine II and Cyberbug 2077 I have better experience on lower core 9800X3D compared to higher core prev. gen SKU ‍♂️
That's because the 9800 has faster cores, which is exactly what you need for games and your type of work. Some work needs more cores, but apparently not the one you're doing, which is fine. :)
 
Joined
Sep 23, 2023
Messages
558 (1.21/day)
@AleksandarK


I probably won't upgrade to this one until after 1 or 2 years. If this was like a 30% performance jump (its not, comparing 7800X3D to 9800X3D), then sure, but not $700 sure. :laugh:
thats what i always do. wait for prices to drop. I just got the 5950x. so will probably get this 9950x/3d in 3-4 for $315

still will be a kick asz cpu a few years later. I believe in jumping AT LEAST 2 gens to get the best bang for buck
 
Joined
Apr 24, 2020
Messages
2,738 (1.60/day)
If I recall correctly, inter CCD latencies were corrected after Zen 5 release to be in the same range as Zen 4. At the time of release, worst case latency was just over 210 ns. Now, it should be about the same as Zen 4: 80 ns.

View attachment 377444

I believed you, but it took me a while to verify it with a link.


I'm finalizing my 5-year-upgrade build, and am planning to go to Microcenter for the Ryzen 9 9900x build soon. So getting 100% proof of this latency issue being fixed was a big priority for my 9900x vs 9800x3d decision.

80ns is within the realm of P-core to P-core on the Intel Ultra 7 265k. I do think the Intel Ultra 7 is underrated but I'm too much of an AVX512 fanboy so Zen5 wins me over.

Chips-and-cheese core-to-core latency graphs of Arrow Lake: https://chipsandcheese.com/p/examining-intels-arrow-lake-at-the

1735369628854.jpeg
 
Last edited:
Joined
Jun 2, 2017
Messages
9,411 (3.40/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Joined
Jul 5, 2013
Messages
28,343 (6.76/day)
Joined
Apr 24, 2020
Messages
2,738 (1.60/day)
Those are core to core latency numbers on that graph. There are processing latency penalties for core to inter-ccx-cache transfers and requests.

You missed the update. Its under 80ns now as widely reported by many reliable tech discussion sites.

I linked TechPowerup earlier, but here's Chips and Cheese's tests as well:

1735370050584.jpeg



You were correct at launch. The issue is that AMD released new microcode recently that fixed the 200-to-400 nanosecond latencies and pushed it all the way down to 80 or less.
 
Joined
Jul 5, 2013
Messages
28,343 (6.76/day)
You missed the update. Its under 80ns now as widely reported by many reliable tech discussion sites.

I linked TechPowerup earlier, but here's Chips and Cheese's tests as well:

View attachment 377454


You were correct at launch. The issue is that AMD released new microcode recently that fixed the 200-to-400 nanosecond latencies and pushed it all the way down to 80 or less.
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.
 
Last edited:
Joined
Jun 29, 2018
Messages
546 (0.23/day)
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.
The additional L3 latency of X3D for 7950X3D vs 7950X is 1.61 ns as tested by Chips and Cheese which for the capacity increase is a terrific achievement. For Zen 5 it is most likely improved further:

It's not like AMD hasn't thought about doing X3D on more than one chiplet, in fact they are selling EPYCs like 9684X with 12 CCDs and 1152MB L3.

Edit: Zen 5 X3D latency penalty is about the same:
 
Last edited:
Joined
Dec 23, 2021
Messages
24 (0.02/day)
System Name SunMaster special
Processor 5950x
Motherboard Gigabyte X570 Aorus Master
Cooling Arctic Cooling Liquid Freezer 420 AIO
Memory 4x16GB@3800
Video Card(s) Nvidia 970
Storage WD Black SN850 512GB
Display(s) 2x Philips BDM3270
Case Fractal Meshify 2 XL
Power Supply EVGA Supernova GA 850
Keyboard Corsair K95
Software Windows 11
It's not relevant to consumers, it's relevant to enthusiasts. I assume you are not the latter. If you are not interested in the topic, you can kindly avoid butting in. Thank you :)
I think you should contact Intel and, judging by your enthusiast grade expertize, tell them to only manufacture little cores from now on.
 
Joined
Mar 13, 2021
Messages
480 (0.35/day)
Processor AMD 7600x
Motherboard Asrock x670e Steel Legend
Cooling Silver Arrow Extreme IBe Rev B with 2x 120 Gentle Typhoons
Memory 4x16Gb Patriot Viper Non RGB @ 6000 30-36-36-36-40
Video Card(s) XFX 6950XT MERC 319
Storage 2x Crucial P5 Plus 1Tb NVME
Display(s) 3x Dell Ultrasharp U2414h
Case Coolermaster Stacker 832
Power Supply Thermaltake Toughpower PF3 850 watt
Mouse Logitech G502 (OG)
Keyboard Logitech G512
Not if games were optimized for dual CCD operation. I suspect that will never happen even though AMD became a software company.
The problem with most if not all games is there is a "master" thread that basically everything interacts with so no matter what the game engine is doing it will always have an interaction with this master thread on the regular to keep everything timed correctly. This is half the problem with scaling out games to utilise more cores effectively as no matter what, you are still dependant on the main thread to tie everything back together.

Databases etc can have completely seperate threads that do no interact with the master except at point of creation and completion so they miss all the "regular" penalties of inter CCD communication.
 
Joined
Mar 18, 2023
Messages
937 (1.44/day)
System Name Never trust a socket with less than 2000 pins
Read the article, the CPU boosts to top clocks on both CCDs this time around. X3D no longer limits frequency or heat.

We only know that the max turbo frequency is the same for both CCDs.

That doesn't necessarily say that they spend the same time at that speed under all conditions.

Anyway, I want one.
 
Joined
Jun 2, 2017
Messages
9,411 (3.40/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.
Show me the numbers to support your argument.

The problem with most if not all games is there is a "master" thread that basically everything interacts with so no matter what the game engine is doing it will always have an interaction with this master thread on the regular to keep everything timed correctly. This is half the problem with scaling out games to utilise more cores effectively as no matter what, you are still dependant on the main thread to tie everything back together.

Databases etc can have completely seperate threads that do no interact with the master except at point of creation and completion so they miss all the "regular" penalties of inter CCD communication.
See City Skylines 2 and Space Marine 2.
 
Joined
Apr 24, 2020
Messages
2,738 (1.60/day)
Context is important. Again those are CORE-TO-CORE latencies for the non-X3D model 9900X. Core-to-Core and Core-to-interCCX cache is different with the X3D versions and comes with additional latency that can not be avoided. Now they may have improved it, I'll concede to that, but it is VERY unlikely they have cracked below 350ns no matter what refinements and optimizations they've made. The 3D cache has an additional die boundary to cross for any process, regardless of type. Now for the actual physical die the 3D cache is mounted on, the latency is as was stated above, 210ns-ish, but for the non-attached CCX, there is additional latency involved depending on the transaction request.

Does that make sense? This is why the 3D cache being mounted to only one CCX is bad for latency dependent tasks, like games for example. AMD needed to divide the 3D cache between both CCXs or just give both the same cache dies and interlink through the I/OD.

200ns is 5MHz and slower latencies than that I can literally measure with my hobby-grade oscilloscope and Arduino-like AVR microcontrollers. Such latencies are possible on server-class systems where more chips and dies are in play but I'd be surprised to see it on a simpler desktop.

Die-to-die latencies do exist of course but on a scale far smaller than you might imagine. 200ns+ is server-grade equipment latencies, not something I'd expect to see on a desktop system. And that's because server-grade systems have more RAM, more RAM Controllers, more dies and more caches that need to communicate. So everything slows down.

---------

Anyway, 200ns latencies for an on-package SRAM makes no sense. That's slower than DRAM (!!!!) like DDR5 technologies. SRAM always had much smaller latencies than that, and I expect that the x3d caches are made out of the faster SRAM and not the slower DRAM. (also: logic companies like AMD/TSMC can make SRAM more easily than DRAM. DRAM is actually very difficult to make on these processes)
 
Joined
Jul 13, 2016
Messages
3,365 (1.09/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) JDS Element IV, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse PMM P-305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
We only know that the max turbo frequency is the same for both CCDs.

That doesn't necessarily say that they spend the same time at that speed under all conditions.

Anyway, I want one.

Non-X3D and X3Ds now perform about the same thermally

View attachment 1735404025885.webp


Knowing that frequency and thermals are similar, the probability that boosting characteristics will be different enough to make a notable difference in games in near zero.
 
Joined
Jan 3, 2021
Messages
3,634 (2.50/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Add a zero behind that and you'd be getting closer. The latency difference between one CCX trying to access the 3D cache on the other CCX is at least 550ns(ish). The CCX with the cache has much faster access but that is because it's directly connected and doesn't need to access through the I/OD.
Which review arrived at these results?
 
Joined
Jul 5, 2013
Messages
28,343 (6.76/day)
The additional L3 latency of X3D for 7950X3D vs 7950X is 1.61 ns as tested by Chips and Cheese which for the capacity increase is a terrific achievement. For Zen 5 it is most likely improved further:

It's not like AMD hasn't thought about doing X3D on more than one chiplet, in fact they are selling EPYCs like 9684X with 12 CCDs and 1152MB L3.

Edit: Zen 5 X3D latency penalty is about the same:
Once again, context is important. That data is for the 9800X3D...

People, learn how to context.
 
Joined
Jun 29, 2018
Messages
546 (0.23/day)
Once again, context is important. That data is for the 9800X3D...

People, learn how to context.
Still no sources for your claims? I've provided professional measurements for two generations of X3D CPUs with both topologies and neither comes even close to what you suggested the latency impact is.
What is more your 550ns figure is over twice the time that one EPYC Turin core takes to communicate with a core in another socket. Please explain how you arrived at this figure, and what is the "context" here.

Edit: seems that the context here is trolling, but it's OK - I've refreshed my knowledge a bit by researching this.
For completeness here's the L3 latency plot for an EPYC Milan-X with 8 X3D CCDs where going to another X3D slice has a penalty but overall keeps below DRAM latency:
 
Last edited:
Joined
Nov 3, 2011
Messages
697 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
By that same token, even the single die CCD X3D parts are actually not great - outside of gaming and cache limited / memory bandwidth / thread limited scenarios the higher boosting higher TDP normal CCD CPUs win in average productivity.

Anyway, as for the dual X3D part.... Someone will always pay for it even if the benefit is only 1% (or even less) over the next competitive product in the lineup, witness the 14900KS...
9800X3D is faster than 9700X in Blender.
 
Joined
Mar 13, 2021
Messages
480 (0.35/day)
Processor AMD 7600x
Motherboard Asrock x670e Steel Legend
Cooling Silver Arrow Extreme IBe Rev B with 2x 120 Gentle Typhoons
Memory 4x16Gb Patriot Viper Non RGB @ 6000 30-36-36-36-40
Video Card(s) XFX 6950XT MERC 319
Storage 2x Crucial P5 Plus 1Tb NVME
Display(s) 3x Dell Ultrasharp U2414h
Case Coolermaster Stacker 832
Power Supply Thermaltake Toughpower PF3 850 watt
Mouse Logitech G502 (OG)
Keyboard Logitech G512
For the people who keep quoting Latency graphs to argue there is no penalty, consider this. Those tests you are quoting are 1 core accessing 1 core on the 2nd CCD, I have linked a test that goes into further details where they load down CCDS from single thread to fully loaded and measured its latency and in actually splitting threads across the two CCDs and with Zen 4 it is bad!!!


Zen 4 has a hardware limitation that a dual X3D setup would have been absolutly HORRENDOUS in performance as accessing the 2nd CCDs cache would have been only as fast as accessing DRAM in certain worst case scenarios and can very easily see 2-3 times the latency penalty rising to nearly 10 times in the worst case. I suspect Zen 3/5xxx series parts would have seen similar issues due to the design of the IO Die etc

Zen 5 has seemingly fixed this issue as well as having the high clock speeds due to the relocated X3D. I wonder if we AMD are holding back dual X3D parts in case Intel pulls something out of the bag ala Nvidias origianl Ti/Super variants of a few years ago? I mean the Single CCD parts are completely handing Intel the L in gaming by quite a margin currently.

Also are they trying to prevent confusion as the dual x3d parts would segregate the market even futher again as you now have 3 different SKUs for each core count and with desktop parts probably pushing up towards the $/£1k mark again for the top end non HEDT part. How much would it cut into their lower end HEDT/Workstation sales.
 
Joined
Nov 3, 2011
Messages
697 (0.15/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
SMT isn't magic. SMT works by splitting resources between the two threads.

Primarily, the caches. As we all know, games love cache so I'm not surprised that splitting the cache has a detrimental effect.

But even if a thread isn't loaded, the register files, reorder buffers, decoders and branch predictors remain shared. So SMT will have slightly worse single threaded performance.

SMT is ideal when a 5%ish drop in single threaded is an acceptable tradeoff for +40% multi threaded performance. Games do not work like this.

Intel has changed their design to P-Cores which specialize in singlethread, and E-cores which specialize in multi thread. But this seems like a poor strategy to me for other reasons....



I expect that video is bad for x3d.

The name of the game is fitting in the cache. Video games have lots of stuff that is larger than 32MB but less than 96MB, and the CPU automatically discovers the hot data to share.

Video is not like that. You watch (or encode) one frame and then move into the next one. Nothing will fit in cache. Or at least, nothing extra really fits in the 33rd MB that's worthwhile.

Video and 3D modeling (Blender) usually prefer more cores... While dealing with so much data that the caches are blown over and useless.
Zen 5's 8 decoders are bottlenecked by Zen 4 era I/O die.

For Blender, 9800X3D beats 9700X, and nearly rivals 16 cores 5950X and 12 cores 7900. 9800X3D's SMT is strong relative to Zen 3's SMT.
 
Top