• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

Joined
Jul 13, 2016
Messages
3,296 (1.08/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Most of the people buying high core count chips aren't doing it for gaming

We know this isn't true given Intel has sold high core count chips for 2 generations advertised towards gamers.

You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks. I for one would have purchased a 7950X3D if it had matched a 7800X in gaming and I might purchase a 9950X3D if it matches the 9800X3D in gaming performance. For people buying in this price bracket it's a no-brainer to spend a little bit more to get a system that can do it all.

the X3D chips perform worse in most productivity and creative tasks where high core count matters.

You are conflating things, X3D chips perform worse in certain applications that are frequency sensitive that don't benefit from cache. In core heavy workloads they are 100% equal to their non-X3D counterparts.

Mind you, if AMD stacks the CCD above the cache as the article implies they may do, that negative disappears.

X3D makes much more sense for six and eight core chips than 16 core chips.

We know this is false because AMD themselves has stated X3D was designed for servers. That it came to consumer products is due to a side experiment by an AMD employee who wanted to see if there was benefit in everyday workloads.

Yes because of the added cache. The added cache produces gains in some areas but the limits it imposes causes losses in other areas. There's nothing wrong with that. It's great tech it's got a very specific focus and trade offs such as this have always existed. It's a lateral move to focus on a specific area.

Read the article, it specifically states that AMD may be getting rid of these limiations.
 
Joined
Sep 1, 2020
Messages
2,361 (1.52/day)
Location
Bulgaria
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
 

SL2

Joined
Jan 27, 2006
Messages
2,454 (0.36/day)
You are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks.
I think we all forget/ignore people telling us that a product is just for one thing from time to time.

As if you're expected to have one 7800X3D for games, and a 7950X for work. Strictly thinking inside the box and turn it into law lol.

Or, people who won't stop bitching about why gaming laptops won't/shouldn't have cameras.. yeah you're supposed to buy another laptop for that, or a separate camera..

/end of rant
 
Joined
Aug 12, 2010
Messages
133 (0.03/day)
Location
Brazil
Processor Ryzen 7 7800X3D
Motherboard ASRock B650M PG Riptide
Cooling Wraith Max + 2x Noctua Redux NF-P12
Memory 2x16GB ADATA XPG Lancer Blade DDR5-6000 CL30
Video Card(s) Powercolor RX 7800 XT Fighter OC
Storage ADATA Legend 970 2TB PCIe 5.0
Display(s) Dell 32" S3222DGM - 1440P 165Hz + P2422H
Case HYTE Y40
Audio Device(s) Microsoft Xbox TLL-00008
Power Supply Cooler Master MWE 750 V2
Mouse Alienware AW320M
Keyboard Alienware AW510K
Software Windows 11 Pro
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
 
Joined
Jun 18, 2021
Messages
2,551 (2.02/day)
Latency? Light travels .3 meters in 1 ns. Latency isn't an issue.

Light does but electricity doesn't. Since AMD hasn't moved to photonic computing your comment is not very relevant.

Though indeed there shouldn't be any difference, it's still all in the same package and whatnot
 
Joined
Apr 30, 2020
Messages
992 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
 
Joined
May 10, 2023
Messages
279 (0.49/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
No, not a thing.
 
Joined
Aug 12, 2022
Messages
248 (0.29/day)
Exactly; and it's going to launch with no competition.
What's funny about that is that since Meteor Lake Intel has put their cores on top of another die. Before Meteor Lake came out, there were rumors that it was going to have an L4 cache in the base tile. It seems like Arrow Lake is pretty close to having the same CPU-stacked-over-cache technology if Intel wanted it to.
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
That's an interesting concern. X3D had to be lower power, but now it won't need go be. But the 9000 series is a little more efficient than the 7000 series, and in other chips usually more cash does translate to more power savings even at the same frequency.
Most of the people buying high core count chips aren't doing it for gaming and the X3D chips perform worse in most productivity and creative tasks where high core count matters. X3D makes much more sense for six and eight core chips than 16 core chips.
Theoretically, with the v-cache no longer sitting between the CPU and the cooler, the X3D chips will be the same speed or faster than the regular chips in every use case. And since many people want one CPU both for productivity and gaming, there will still be demand for the higher core count chips.
 
Joined
Jan 14, 2023
Messages
836 (1.21/day)
System Name Asus G16
Processor i9 13980HX
Motherboard Asus motherboard
Cooling 2 fans
Memory 32gb 4800mhz
Video Card(s) 4080 laptop
Storage 16tb, x2 8tb SSD
Display(s) QHD+ 16in 16:10 (2560x1600, WQXGA) 240hz
Power Supply 330w psu
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
Yes
 
Joined
Oct 30, 2020
Messages
253 (0.17/day)
Hard disagree, AMD has X3D cache in chips all the way down to the 5600X3D.

Having 2 cache chiplets on $700 - $750 parts is likewise absolutely possible.

Even if the uplift is a mere 3%, every little bit matters at the high end. Particularly when it could make the 9950X3D reach gaming parity with the 9800X3D, it would upsell a lot of people to the more expensive processor.

Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.
 
Joined
Mar 18, 2024
Messages
185 (0.71/day)
Location
Queensland, Australia
System Name Full Aorus PC that upgrades forever
Processor Ryzen 5 5600
Motherboard Gigabyte Aorus X370 Gaming 5
Cooling Cooler Master MasterLiquid ML240L V2
Memory 32 GB 3200mhz CL16 Silicon Power (2 x 16gb)
Video Card(s) Aorus 5700 XT
Storage 2x Samsung 970 evo plus 500gb (One is on an expansion card)
Display(s) XG2431 (Luv ya Viewsonic for this great monitor)
Case Cooler Master MB TG520
Audio Device(s) HyperX Cloud Alpha
Power Supply AP850GM (Aorus 850 Watt)
Mouse Razer Viper Ultimate
Keyboard Redragon K614
Software Windows 11
Benchmark Scores 4.7GHZ on the CPU at 1.3 Volts
People might be able to get the cpu running at the same speed as the 9700x, nice.
 
Joined
Apr 24, 2020
Messages
2,713 (1.61/day)
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?

This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
 
Joined
Sep 17, 2014
Messages
22,491 (6.03/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
It generally isn't, but we turned it into a bit of fun :)
 
Joined
Jul 24, 2024
Messages
245 (1.84/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
I know where AMD is aiming with this ...

Since node shrinking will continue to be a tougher problem (less nm = less process yields, more heat density, etc), AMD wants to make place for bigger CCDs even with 4nm or 3nm. L3 cache takes size of roughly 4 Zen 5 cores. Putting that cache below cores would allow not only putting more cores into a CCD, but also expanding L3 cache and other caches, too. This way AMD can easily reach 10-12 cores per CCD with 96+ MB of cache in regular non-X3D processors.

Putting cache below CCD also allows for significant core clocks boost, basically the same clocks as you'd get with non-X3D CPUs.

One may start to think whether this is not the beginning of an end of X3D processors as we know them.
 
Joined
Jan 11, 2022
Messages
882 (0.83/day)
I don't think that's needed with cache size is this large, and all cores are connected to all cache anyway. I'm talking ONE SINGLE V-cache chip for ALL cores.

I haven't heards about such a thing, sounds like a really bad idea. AMD just moved V-cache in order to cool the CCD properly, that would one step forward, three steps backwards.
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.

Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
as for virtualisation the shared L3 is nothing but a security risk.

it's a joke that refers to https://www.imdb.com/title/tt0105929/
 

SL2

Joined
Jan 27, 2006
Messages
2,454 (0.36/day)
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.
Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
I've never said that. Also, that's not the only reason for doing it.
 
Joined
Jan 11, 2022
Messages
882 (0.83/day)
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.

Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.

I've never said that. Also, that's not the only reason for doing it.
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
 

SL2

Joined
Jan 27, 2006
Messages
2,454 (0.36/day)
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.
 
Joined
Dec 25, 2020
Messages
6,827 (4.74/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.

Unified double (or even multiple, in the case of Epyc) V-cache is the future. But to achieve this, they must first overcome the internal fabric bottleneck so accessing data across any chiplet or part of the chip is effectively seamless. This will probably happen when they move from 2.5D packaging (the current chiplet system) into a fully 3D system like Foveros/Intel's 3D tiling system. This physical closeness should allow a ultra-high-bandwidth link that will make such a thing possible.
 
Joined
Jan 3, 2021
Messages
3,518 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
It's possible if the bottom die has contact pads on both sides. TSV makes that possible.
 
Joined
Feb 1, 2013
Messages
1,268 (0.29/day)
System Name Gentoo64 /w Cold Coffee
Processor 9900K 5.2GHz @1.312v
Motherboard MXI APEX
Cooling Raystorm Pro + 1260mm Super Nova
Memory 2x16GB TridentZ 4000-14-14-28-2T @1.6v
Video Card(s) RTX 4090 LiquidX Barrow 3015MHz @1.1v
Storage 660P 1TB, 860 QVO 2TB
Display(s) LG C1 + Predator XB1 QHD
Case Open Benchtable V2
Audio Device(s) SB X-Fi
Power Supply MSI A1000G
Mouse G502
Keyboard G815
Software Gentoo/Windows 10
Benchmark Scores Always only ever very fast
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
You do realize you can undervolt and underclock it as you need, in order to hit YOUR power efficiency targets? Why should your goal hamper others' ambition to go fast.
 

SL2

Joined
Jan 27, 2006
Messages
2,454 (0.36/day)
Interesting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets
No, they're the same, 120 W TDP.

It's just that the 9800X3D actually can make use of it, not really a drawback. Just change it if you're not happy with it.
 
Joined
Jul 13, 2016
Messages
3,296 (1.08/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.

Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.
 
Joined
Oct 30, 2020
Messages
253 (0.17/day)
Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.

I read it and it's really not hard to understand the article but the part about not losing clocks is pure speculation. Turns out they were incorrect anyway and looking at the boost clocks between 9700x and 9800X3D, there's still a hit to clocks albeit less than before.

So yeah, adding L3 to both CCD's would reduce productivity for a minor gain in performance. What's worse is that it'll increase performance for unwanted situations which they would want to mitigate through drivers anyway because ideally you want the gaming cores to be pinned to one CCD. In situations where it jumps to another, it won't match the 9800X3D's performance simply because of the latency incurred to jump to the other CCD.

So you're looking at a slight benefit for games in edge cases and a slight hit to productivity for a CPU that costs more. Pretty sure AMD said the same during 7950X3D launch when they did the math. Whether that changes remains to be seen
 
Joined
Nov 26, 2021
Messages
1,652 (1.50/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
That's just the TLB for data. In addition, there's a 2048 entry L2 TLB for instructions. Zen CPUs also can coalesce 4 consecutive pages into one TLB entry so one Zen 5 core can cover 64 MB of cache with the L2 data TLB.

Zen 4 also has page coalescing capability. There weren’t specifics on whether this mechanism changed in Zen 4, though performance counter unit mask descriptions indicate it’s still present. Assuming Zen 4 can coalesce up to four consecutive 4K pages like Zen 2 and 3, the 3072 entry L2 DTLB can cover up to 48 MB which is great news. While Zen 2/3’s 2048 entry L2 DTLB already preformed reasonably well, more is always better.
 
Top