Saturday, October 26th 2024

AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

Much of the Ryzen 7 9800X3D teaser material from AMD had the recurring buzzwords "X3D Reimagined," causing us to speculate what it could be. 9550pro, a reliable source with hardware leaks, says that AMD has redesigned the way the CPU complex die (CCD) and 3D V-cache die (L3D) are stacked together. In past generations of X3D processors, such as the 5800X3D "Vermeer-X" and the 7800X3D "Raphael-X," the L3D is stacked on top of the CCD. It would stack above the central region of the CCD that has the on-die 32 MB L3 cache, while blocks of structural silicon would be placed on top of the edges of the CCD that have the CPU cores, with these structural silicon blocks performing the crucial task of transferring heat from the CPU cores to the IHS above. This is about to change.

If the leaks are right, AMD has inverted the CCD-L3D stack with the 9000X3D series such that the "Zen 5" CCD is now on top, the L3D is below it, under the central region of the CCD. The CPU cores now dissipate heat to the IHS as they do on regular 9000 series processors without the 3D V-cache technology. The way we imagine they achieved this is by enlarging the L3D to align with the size of the CCD, and serve as a kind of "base tile." The L3D would have to be peppered with TSVs that connect the CCD to the fiberglass substrate below. We know where AMD is going with this in the future. Right now, the L3D "base tile" contains the 64 MB 3D V-cache that gets appended to the 32 MB on-die L3 cache, but in the future (probably with "Zen 6"), AMD could design the CCDs with TSVs even for the per-core L2 caches.
This piece of speculation also perfectly explains what "X3D boost" could be. With the CCD making direct contact with the IHS the way it is in non-X3D processors, the X3D processors could have the same overclocking capabilities as the regular chips. There are much fewer thermal hurdles in the way, and AMD can go ahead and give these chips the same TDP and PPT values as regular chips, as well as higher clock speeds. The company used to be conservative with the PPT and clock speeds of its X3D processors in the past.

AMD is expected to launch the Ryzen 7 9800X3D on November 7, 2024.
Source: HXL (Twitter)
Add your own comment

110 Comments on AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

#76
TumbleGeorge
Can you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
Posted on Reply
#77
SL2
evernessinceYou are vastly under-estimating the number of people wanting a chip that can do both gaming and core heavy tasks.
I think we all forget/ignore people telling us that a product is just for one thing from time to time.

As if you're expected to have one 7800X3D for games, and a 7950X for work. Strictly thinking inside the box and turn it into law lol.

Or, people who won't stop bitching about why gaming laptops won't/shouldn't have cameras.. yeah you're supposed to buy another laptop for that, or a separate camera..

/end of rant
Posted on Reply
#78
ThomasK
Since when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
Posted on Reply
#79
trsttte
demiraelLatency? Light travels .3 meters in 1 ns. Latency isn't an issue.
Light does but electricity doesn't. Since AMD hasn't moved to photonic computing your comment is not very relevant.

Though indeed there shouldn't be any difference, it's still all in the same package and whatnot
Posted on Reply
#80
DemonicRyzen666
I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
Posted on Reply
#81
igormp
TumbleGeorgeCan you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
No, not a thing.
Posted on Reply
#82
Squared
claylomaxExactly; and it's going to launch with no competition.
What's funny about that is that since Meteor Lake Intel has put their cores on top of another die. Before Meteor Lake came out, there were rumors that it was going to have an L4 cache in the base tile. It seems like Arrow Lake is pretty close to having the same CPU-stacked-over-cache technology if Intel wanted it to.
yfn_ratchetInteresting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
That's an interesting concern. X3D had to be lower power, but now it won't need go be. But the 9000 series is a little more efficient than the 7000 series, and in other chips usually more cash does translate to more power savings even at the same frequency.
SOAREVERSORMost of the people buying high core count chips aren't doing it for gaming and the X3D chips perform worse in most productivity and creative tasks where high core count matters. X3D makes much more sense for six and eight core chips than 16 core chips.
Theoretically, with the v-cache no longer sitting between the CPU and the cooler, the X3D chips will be the same speed or faster than the regular chips in every use case. And since many people want one CPU both for productivity and gaming, there will still be demand for the higher core count chips.
Posted on Reply
#83
A&P211
ThomasKSince when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
Yes
Posted on Reply
#84
mkppo
evernessinceHard disagree, AMD has X3D cache in chips all the way down to the 5600X3D.

Having 2 cache chiplets on $700 - $750 parts is likewise absolutely possible.

Even if the uplift is a mere 3%, every little bit matters at the high end. Particularly when it could make the 9950X3D reach gaming parity with the 9800X3D, it would upsell a lot of people to the more expensive processor.
Thing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.
Posted on Reply
#85
RaceT3ch
People might be able to get the cpu running at the same speed as the 9700x, nice.
Posted on Reply
#86
dragontamer5788
TumbleGeorgeCan you imagine when the on-chip cache gets too much and the CPU crashes because it can't index it within the normal time of that function. Is this possible?
This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
Posted on Reply
#87
Vayra86
ThomasKSince when is it allowed to say a single word, as a reply comment on this forum, such as "amazing" and "interesting"?

I remember a couple of months ago, my "ok" comment being deleted, for not adding anything to the conversation here.
It generally isn't, but we turned it into a bit of fun :)
Posted on Reply
#88
LittleBro
I know where AMD is aiming with this ...

Since node shrinking will continue to be a tougher problem (less nm = less process yields, more heat density, etc), AMD wants to make place for bigger CCDs even with 4nm or 3nm. L3 cache takes size of roughly 4 Zen 5 cores. Putting that cache below cores would allow not only putting more cores into a CCD, but also expanding L3 cache and other caches, too. This way AMD can easily reach 10-12 cores per CCD with 96+ MB of cache in regular non-X3D processors.

Putting cache below CCD also allows for significant core clocks boost, basically the same clocks as you'd get with non-X3D CPUs.

One may start to think whether this is not the beginning of an end of X3D processors as we know them.
Posted on Reply
#89
kondamin
SL2I don't think that's needed with cache size is this large, and all cores are connected to all cache anyway. I'm talking ONE SINGLE V-cache chip for ALL cores.

I haven't heards about such a thing, sounds like a really bad idea. AMD just moved V-cache in order to cool the CCD properly, that would one step forward, three steps backwards.
That would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.

Also note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
And since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
as for virtualisation the shared L3 is nothing but a security risk.
A&P211Yes
it's a joke that refers to www.imdb.com/title/tt0105929/
Posted on Reply
#90
SL2
kondaminThat would be a bad choice as that would make things even slower.
Searching trough memory takes time and the bigger it is the more time it takes.

Giving more cores access to the same memory also racks up penalties.
each core will only have limited time to read and write to the memory, and coordinating everything becomes even harder.
Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.
kondaminAlso note that L3 isn't something that makes everything faster, if you look at the benchmarks provided here by the People of TPU you will see that it's only interesting for virtualisation and gaming.
Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.
kondaminAnd since gaming doesn't scale with an increasing number of cores. a second CCD with access to a big cache is worthless for gaming.
I've never said that. Also, that's not the only reason for doing it.
Posted on Reply
#91
kondamin
SL2Well yeah, that's what I meant with "hard or complicated". You're correct in theory, but we have no grasp of where the practical limit currently is for doing this.

Not sure why you're telling me this lol, I never said it makes everything faster. You're jumping to conclusions here.

I've never said that. Also, that's not the only reason for doing it.
I think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
Posted on Reply
#92
SL2
kondaminI think I forgot writing down that it probably wasn't worth the extra cost it would involve given the aforementioned which is why i listed them...
My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.
Posted on Reply
#93
Dr. Dro
SL2My point is in the post before.

The 16 cores with all V-cache is not necessarily about thinking you need more than 8 cores for games. It's for people who wants 16 cores for work, but not wanting a compromize in either way with that high price. Moved and double V-cache might help there. Unified, shared V-cache would be a possible next step, but maybe not feasible for one reason or another.

Then there's conflicing info about recommended hardware for Space marine 2 4k, for instance. I haven't read into it, but 12 cores is recommended (both AMD and Intel) on Steam.
Unified double (or even multiple, in the case of Epyc) V-cache is the future. But to achieve this, they must first overcome the internal fabric bottleneck so accessing data across any chiplet or part of the chip is effectively seamless. This will probably happen when they move from 2.5D packaging (the current chiplet system) into a fully 3D system like Foveros/Intel's 3D tiling system. This physical closeness should allow a ultra-high-bandwidth link that will make such a thing possible.
Posted on Reply
#94
Wirko
DemonicRyzen666I highly doubt this even possible as the substrate/PCB has all the connections for the cpu on it's layer, also the cpu are flip chips & have been for a while.
It's possible if the bottom die has contact pads on both sides. TSV makes that possible.
Posted on Reply
#95
mouacyk
yfn_ratchetInteresting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets it's losing what I like about the X3D chips in the first place, and that is amazing gaming performance at low-ish power. If it's about the same as the 7800X3D I'm just gonna get the 7800X3D, lest they whoopsie a new IOD on these with more gen 5 lanes and CKD support.
You do realize you can undervolt and underclock it as you need, in order to hit YOUR power efficiency targets? Why should your goal hamper others' ambition to go fast.
Posted on Reply
#96
SL2
yfn_ratchetInteresting idea, but I'm still cagey about it. If a lot of the improvements on Zen 5 X3D is just due to higher power targets
No, they're the same, 120 W TDP.

It's just that the 9800X3D actually can make use of it, not really a drawback. Just change it if you're not happy with it.
Posted on Reply
#97
evernessince
mkppoThing is, you're sacrificing productivity by 3% as well since the other CCD won't clock as high. So the overall picture might be slightly different but I agree with the fact that matching 9800X3D while taking a 6% hit in productivity vs 9950X is better than being 3% slower while getting a 3% hit in productivity.
Again another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.
Posted on Reply
#98
mkppo
evernessinceAgain another person who didn't read the article or simply doesn't understand.

No, if what's stated ends up being correct in that the thermal issue is solved and clocks are the same between the X3D and non-X3D part productivity performance will be equal to or better than non-X3D parts. It would eliminate the downside to X3D chips.
I read it and it's really not hard to understand the article but the part about not losing clocks is pure speculation. Turns out they were incorrect anyway and looking at the boost clocks between 9700x and 9800X3D, there's still a hit to clocks albeit less than before.

So yeah, adding L3 to both CCD's would reduce productivity for a minor gain in performance. What's worse is that it'll increase performance for unwanted situations which they would want to mitigate through drivers anyway because ideally you want the gaming cores to be pinned to one CCD. In situations where it jumps to another, it won't match the 9800X3D's performance simply because of the latency incurred to jump to the other CCD.

So you're looking at a slight benefit for games in edge cases and a slight hit to productivity for a CPU that costs more. Pretty sure AMD said the same during 7950X3D launch when they did the math. Whether that changes remains to be seen
Posted on Reply
#99
AnotherReader
dragontamer5788This is already becoming true for the "TLB", translation lookaside buffer.

A "Page" has been set at 4096 bytes since the 1980s (even ARM systems are paged at 4k). There's a 4096 entry TLB in Zen5, meaning there is 4096 (entries) x 4096 bytes (per entry with default pages) == 16MB of RAM indexed in the Virtual RAM page table before the CPU Core runs out of entries.

That's smaller than Zen5 x3d L3 cache. In fact, this curious slowdown has been true for quite a few generations (and is likely a reason why Zen5 upgraded from 3072 entry into 4096 entry TLB between Zen4 and Zen5).

--------

Modern computers can theoretically use "HugePages" (2MB or 1GB in size). Servers are configured to use them but consumer hardware has so much backwards compatibility issues with Windows and Linux that the default page size remains 4k in practice. Still, if you can play with the right settings, setting up the TLB to be of these larger page sizes leads to 10%+ improvements as more data effectively fits in the TLB-cache (a process necessary before the real cache is hit).
That's just the TLB for data. In addition, there's a 2048 entry L2 TLB for instructions. Zen CPUs also can coalesce 4 consecutive pages into one TLB entry so one Zen 5 core can cover 64 MB of cache with the L2 data TLB.
Zen 4 also has page coalescing capability. There weren’t specifics on whether this mechanism changed in Zen 4, though performance counter unit mask descriptions indicate it’s still present. Assuming Zen 4 can coalesce up to four consecutive 4K pages like Zen 2 and 3, the 3072 entry L2 DTLB can cover up to 48 MB which is great news. While Zen 2/3’s 2048 entry L2 DTLB already preformed reasonably well, more is always better.
Posted on Reply
#100
evernessince
mkppoI read it and it's really not hard to understand the article but the part about not losing clocks is pure speculation. Turns out they were incorrect anyway and looking at the boost clocks between 9700x and 9800X3D, there's still a hit to clocks albeit less than before.
1) That's a different leak, not a fact as you seem to be implying

2) You are assuming that the 9800X3D will be clocked as high as the 9950X3D. If they can increase the clocks on the new X3D, they may choose to further segment by having higher clocks on the higher end part.

Mind you either way the frequency is increasing as compared to prior gen X3D parts so relative to past X3D parts any performance different as a result of the X3D cache will have changed this generation.
mkppoSo yeah, adding L3 to both CCD's would reduce productivity for a minor gain in performance. What's worse is that it'll increase performance for unwanted situations which they would want to mitigate through drivers anyway because ideally you want the gaming cores to be pinned to one CCD. In situations where it jumps to another, it won't match the 9800X3D's performance simply because of the latency incurred to jump to the other CCD.

So you're looking at a slight benefit for games in edge cases and a slight hit to productivity for a CPU that costs more. Pretty sure AMD said the same during 7950X3D launch when they did the math. Whether that changes remains to be seen
Ok now I understand. You read the article, you just don't know what you are talking about / can't understand it.

"slight benefit for games in edge cases"?

Clearly you are unaware that the 7950X3D was 14% faster on average than the 7950X in games.

Even if there were 0 frequency improvements to the 9950X3D, it would mirror that performance increase at the very least.

In the CPU world that isn't slight, it's what you typically get with a new architecture.

You also don't seem to understand what edge cases are either, X3D's boost is not only to edge cases. A wide array of games benefit from X3D. You seem to be arguing against X3D in general which is just dumb. Every benchmark out there disproves you.

Also, since when is an increase in performance "unwanted"? Utter nonsense.
Posted on Reply
Add your own comment
Nov 21st, 2024 03:39 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts