Tuesday, October 29th 2024
Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU
A Chinese tech forum ChipHell user who goes by zcjzcj11111 sprung up a fascinating take on what the next-generation AMD "Navi 48" GPU could be, and put their imagination on a render. Apparently, the "Navi 48," which powers AMD's series-topping performance-segment graphics card, is a dual chiplet-based design, similar to the company's latest Instinct MI300 series AI GPUs. This won't be a disaggregated GPU such as the "Navi 31" and "Navi 32," but rather a scale-out multi-chip module of two GPU dies that can otherwise run on their own in single-die packages. You want to call this a multi-GPU-on-a-stick? Go ahead, but there are a couple of changes.
On AMD's Instinct AI GPUs, the chiplets have full cache coherence with each other, and can address memory controlled by each other. This cache coherence makes the chiplets work like one giant chip. In a multi-GPU-on-a-stick, there would be no cache coherence, the two dies would be mapped by the host machine as two separate devices, and then you'd be at the mercy of implicit or explicit multi-GPU technologies for performance to scale. This isn't what's happening on AI GPUs—despite multiple chiplets, the GPU is seen by the host as a single PCI device with all its cache and memory visible to software as a contiguously addressable block.We imagine the "Navi 48" is modeled along the same lines as the company's AI GPUs. The graphics driver sees this package as a single GPU. For this to work, the two chiplets are probably connected by Infinity Fabric Fanout links—an interconnect with a much higher amount of bandwidth than a serial bus like PCIe. This is probably needed for the cache coherence to be effective. The "Navi 44" is probably just one of these chiplets sitting its own package.
In the render, the substrate and package is made to resemble that of the "Navi 32," which tends to agree with the theory that "Navi 48" will be a performance segment GPU, and a successor to the "Navi 32," "Navi 22," and "Navi 10," rather than being a successor to enthusiast-segment GPUs like the "Navi 21" and "Navi 31." This much was made clear by AMD in its recent interviews with the media.
Do we think the ChipHell rumor is plausible? Absolutely, considering nobody took the very first such renders about the AM5 package having an oddly-shaped IHS seriously. The "Navi 48" being a chiplet-based GPU is something within character for a company like AMD, which loves chiplets, MCMs, and disaggregated devices.
Sources:
ChipHell Forums, HXL (Twitter)
On AMD's Instinct AI GPUs, the chiplets have full cache coherence with each other, and can address memory controlled by each other. This cache coherence makes the chiplets work like one giant chip. In a multi-GPU-on-a-stick, there would be no cache coherence, the two dies would be mapped by the host machine as two separate devices, and then you'd be at the mercy of implicit or explicit multi-GPU technologies for performance to scale. This isn't what's happening on AI GPUs—despite multiple chiplets, the GPU is seen by the host as a single PCI device with all its cache and memory visible to software as a contiguously addressable block.We imagine the "Navi 48" is modeled along the same lines as the company's AI GPUs. The graphics driver sees this package as a single GPU. For this to work, the two chiplets are probably connected by Infinity Fabric Fanout links—an interconnect with a much higher amount of bandwidth than a serial bus like PCIe. This is probably needed for the cache coherence to be effective. The "Navi 44" is probably just one of these chiplets sitting its own package.
In the render, the substrate and package is made to resemble that of the "Navi 32," which tends to agree with the theory that "Navi 48" will be a performance segment GPU, and a successor to the "Navi 32," "Navi 22," and "Navi 10," rather than being a successor to enthusiast-segment GPUs like the "Navi 21" and "Navi 31." This much was made clear by AMD in its recent interviews with the media.
Do we think the ChipHell rumor is plausible? Absolutely, considering nobody took the very first such renders about the AM5 package having an oddly-shaped IHS seriously. The "Navi 48" being a chiplet-based GPU is something within character for a company like AMD, which loves chiplets, MCMs, and disaggregated devices.
59 Comments on Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU
Remember the good ole days of SLI.
Its confused me for the longest time why we couldnt have a dual or quad chiplet GPu design. I thought that all the knowledge and experience that AMD gained working on desktop and server chips would carry over to the GPu side of things but it never did till now.
Which is crossfire on a single chip with dual GPU.
And again it was done before. 25 years ago.
www.techpowerup.com/gpu-specs/voodoo5-5500-agp.c3531
I sure Nvidia will do the same with something soon a Dual GPU on a Single card.
History repeating itself again. Two was always better than one.
Or if you were one of the lucky ones to get the monster before they went out of business
www.techpowerup.com/gpu-specs/voodoo5-6000.c3536
Even today with some tweak this card can still play games It was way way ahead of it's time.
If this is the way AMD goes, then I'm guessing each die has 40 CUs. That might also make sense with Strix Halo having one of these dies.
It's not happening.
I imagine that, due to GPUs having to present things on screen at a very fast rate and with extreme care of not resulting in, say, half the screen being a couple frames behind the other half of the screen, it's probably very complicated to do that when the two GPU "cores" are separated. Which is why SLI/Crossfire had their bridges at first and data synchronization over PCIE later, plus the performance gain wasn't that great.
Plus if this is exposed to the software, developers are gonna whine about having to code for basically explicit multi-GPU lite™.
"For their first multi-die chip, NVIDIA is intent on skipping the awkward “two accelerators on one chip” phase, and moving directly on to having the entire accelerator behave as a single chip. According to NVIDIA, the two dies operate as “one unified CUDA GPU”, offering full performance with no compromises."
www.anandtech.com/show/21310/nvidia-blackwell-architecture-and-b200b100-accelerators-announced-going-bigger-with-smaller-data It is different in that instead of making the entire GPU die on the 5nm node, they took the cache and memory controllers and fabbed them as chiplets on the older 6nm node because these parts do not benefits so much from a node shrink. All of the chiplets were then arranged to make a full die. This was an ingenious way to target the parts of the GPU getting the largest performance benefits of the 5nm node shrink, while saving cost by not using a cutting edge node on the parts that do not. Fantastic engineering in my opinion.
www.techpowerup.com/review/amd-radeon-rx-7900-xtx/
Doesn't make a lot of sense with the rumor that AMD is targeting mid-range and low end this gen unless they solved the cost issue.
I remember back in the days of 1999, 2000 when people had two Diamond Monster Voodoo2's in SLI. I had a rig exactly the same. I met many people that bought the same setup and they complained they did not see much of a difference well guess what 90% of them did not know how to configure the drivers to make the game use the cards. I show them the settings what to change and where in the game and driver to change them and they were like OMG this is freaking awesome. Unreal orginial ran like butter on those Voodoo2's remember playing alot of ut tournament.
I'm an old school gamer. Heck yes my first home system was an atari 2600. So when voodoo's was around I was 25 year old. I'm an old timer now. So have lots of knowelege on systems since 1985 and on. It's like MS telling me my Core i7 3770k with 64 GB ram and 1 nvme 1TB and 500 SSD with a quadro card did not meet the min requirements to run windows 11 Pro. Um ok and your telling me a Core i5 that runs 2 GHZ slower than my i7 is going to be faster. I bypassed that BS and windows 11 on that computer at home runs faster then my work computers at office and they at i5 10500T's.
It's like comparing a harley motorcycle to and Ninja. Sure the Ninja has lots of CC's of power but that harley has raw power and it will win.