Tuesday, October 29th 2024

Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

A Chinese tech forum ChipHell user who goes by zcjzcj11111 sprung up a fascinating take on what the next-generation AMD "Navi 48" GPU could be, and put their imagination on a render. Apparently, the "Navi 48," which powers AMD's series-topping performance-segment graphics card, is a dual chiplet-based design, similar to the company's latest Instinct MI300 series AI GPUs. This won't be a disaggregated GPU such as the "Navi 31" and "Navi 32," but rather a scale-out multi-chip module of two GPU dies that can otherwise run on their own in single-die packages. You want to call this a multi-GPU-on-a-stick? Go ahead, but there are a couple of changes.

On AMD's Instinct AI GPUs, the chiplets have full cache coherence with each other, and can address memory controlled by each other. This cache coherence makes the chiplets work like one giant chip. In a multi-GPU-on-a-stick, there would be no cache coherence, the two dies would be mapped by the host machine as two separate devices, and then you'd be at the mercy of implicit or explicit multi-GPU technologies for performance to scale. This isn't what's happening on AI GPUs—despite multiple chiplets, the GPU is seen by the host as a single PCI device with all its cache and memory visible to software as a contiguously addressable block.
We imagine the "Navi 48" is modeled along the same lines as the company's AI GPUs. The graphics driver sees this package as a single GPU. For this to work, the two chiplets are probably connected by Infinity Fabric Fanout links—an interconnect with a much higher amount of bandwidth than a serial bus like PCIe. This is probably needed for the cache coherence to be effective. The "Navi 44" is probably just one of these chiplets sitting its own package.

In the render, the substrate and package is made to resemble that of the "Navi 32," which tends to agree with the theory that "Navi 48" will be a performance segment GPU, and a successor to the "Navi 32," "Navi 22," and "Navi 10," rather than being a successor to enthusiast-segment GPUs like the "Navi 21" and "Navi 31." This much was made clear by AMD in its recent interviews with the media.

Do we think the ChipHell rumor is plausible? Absolutely, considering nobody took the very first such renders about the AM5 package having an oddly-shaped IHS seriously. The "Navi 48" being a chiplet-based GPU is something within character for a company like AMD, which loves chiplets, MCMs, and disaggregated devices.
Sources: ChipHell Forums, HXL (Twitter)
Add your own comment

59 Comments on Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

#51
3valatzy
Vya DomusThis was long speculated and of course sooner or later it will happen.
Maybe one day when they invest and invent a fast enough communication channel - something like 5 TB/s or higher, then yes.

At least, we will know soon.

www.pcgamesn.com/amd/rdna-4-2025
Posted on Reply
#52
Vya Domus
3valatzysomething like 5 TB/s or higher, then yes.
Categorically not, not even CPU L1 caches are that fast a lot of the time lol.
Posted on Reply
#53
3valatzy
Vya DomusCategorically not, not even CPU L1 caches are that fast a lot of the time lol.
The Ryzen shows otherwise:
10.5 TB/s read - 5.3 TB/s write - 10.0 TB/s copy.

Posted on Reply
#54
Vya Domus
3valatzyThe Ryzen
It's like one of fastest CPUs around, a lot of CPUs out there have slower caches. It doesn't matter, it's still totally absurd, you do not need TB/s of inter chip bandwidth.
Posted on Reply
#55
AnotherReader
Vya DomusIt's like one of fastest CPUs around, a lot of CPUs out there have slower caches. It doesn't matter, it's still totally absurd, you do not need TB/s of inter chip bandwidth.
The MI300X has over 2 TB/s of inter chip bandwidth for each partition, but I agree that with enough caching, bandwidth requirements can be reduced.

Posted on Reply
#56
3valatzy
Vya DomusIt doesn't matter
It matters.


buildapc/comments/15ury94
Vya Domusit's still totally absurd, you do not need TB/s of inter chip bandwidth.
Bandwidth is essential. Otherwise, the chiplets won't work as expected and will fail, because of low performance.

Learn about inter GPU bandwidths.





github.com/te42kyfo/gpu-benches
Posted on Reply
#57
Vya Domus
3valatzyBandwidth is essential.
This is about hypothetical inter chip communication requirements not just simply memory access speed, those benchmarks have literally nothing to do with "inter GPU bandwidths" I don't know if you even properly read and understood what you posted.

Memory access patterns on GPUs are almost always contiguous, each core read/writes to a separate chunk of VRAM, if you break up a monolithic die into chiplets the memory bandwidth requirements stay the same. GPU threads do not communicate between each other the same way CPU cores do, they don't even have the proper hardware for complex synchronization besides simple barriers, you can't even synchronize threads globally, it's simply not how these things are designed. You can really only communicate between threads on the same GPU core, which will always access the same chunk of memory that it's memory controller has access to. GPU cores on different chiplets would not need to access VRAM that's only accessible though a different chiplet. You're the one that needs to read more on GPU architectures.

CPUs with chiplets do need more memory bandwidth either, this makes no sense. But on CPUs it is a different matter, there usually is a lot of inter thread commutation, this does pose a problem with threads on different cores but it's more a matter of latency rather than bandwidth.

Btw SLI works in a totally different matter, it's completely irrelevant to this subject. Each GPU stored a copy of what was contained in the VRAM, so every time frame buffers were updated that had to be copied between the cards.
Posted on Reply
#58
3valatzy
I will tell you one thing:
1. Navi 31 failed, just in the same way like CrossFire failed.
Vya Domusbut it's more a matter of latency rather than bandwidth
Higher bandwidth means lower latency.
So, now explain how cutting a monolithic chip into partitions improves latency ?
Posted on Reply
#59
Vya Domus
3valatzyHigher bandwidth means lower latency.
No. Lol.
Posted on Reply
Add your own comment
Nov 21st, 2024 12:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts