Tuesday, April 12th 2022

"Navi 31" RDNA3 Sees AMD Double Down on Chiplets: As Many as 7

Way back in January 2021, we heard a spectacular rumor about "Navi 31," the next-generation big GPU by AMD, being the company's first logic-MCM GPU (a GPU with more than one logic die). The company has a legacy of MCM GPUs, but those have been a single logic die surrounded by memory stacks. The RDNA3 graphics architecture that the "Navi 31" is based on, sees AMD fragment the logic die into smaller chiplets, with the goal of ensuring that only those specific components that benefit from the TSMC N5 node (6 nm), such as the number crunching machinery, are built on the node, while ancillary components, such as memory controllers, display controllers, or even media accelerators, are confined to chiplets built on an older node, such as the TSMC N6 (6 nm). AMD had taken this approach with its EPYC and Ryzen processors, where the chiplets with the CPU cores got the better node, and the other logic components got an older one.

Greymon55 predicts an interesting division of labor on the "Navi 31" MCM. Apparently, the number-crunching machinery is spread across two GCD (Graphics Complex Dies?). These dies pack the Shader Engines with their RDNA3 compute units (CU), Command Processor, Geometry Processor, Asynchronous Compute Engines (ACEs), Rendering Backends, etc. These are things that can benefit from the advanced 5 nm node, enabling AMD to the CUs at higher engine clocks. There's also sound logic behind building a big GPU with two such GCDs instead of a single large GCD, as smaller GPUs can be made with a single such GCD (exactly why we have two 8-core chiplets making up a 16-core Ryzen processors, and the one of these being used to create 8-core and 6-core SKUs). The smaller GCD would result in better yields per wafer, and minimize the need for separate wafer orders for a larger die (such as in the case of the Navi 21).
Besides two GCDs, there are four MCDs (memory controller dies). Greymon55 predicts that these could be built on the 6 nm (TSMC N6) node, a slightly more advanced node than N7 (7 nm). Each MCD controls two 32-bit memory paths, controlling two memory chips, or 64-bit of the memory bus width. Four such MCDs make up 256-bit. For ASICs with just one GCD, there could be three MCDs (192-bit), or even just two (128-bit). The MCD packs the GDDR6 memory controller, as well as its PHY. There could also be exotic fixed-function hardware for features such as memory compression and ECC (latter being available on Pro SKUs).

The third and final kind of die is the I/O Die. On both Socket AM4 and SP3 processors, the IOD serves as town-square, connecting all the CPU chiplets, and crams memory, PCIe, and other platform I/O. On "Navi 31," the IOD could pack all the components that never need overclocking—these include the PCI-Express switch (which connects the GPU to the system), the Display CoreNext (DCN) component that controls the various display outputs; and perhaps even the Video CoreNext (VCN), which packs the media accelerators. At this point it's not known which node the IOD is built on.

The ether connecting all 7 chiplets on the "Navi 31" MCM is Infinity Fabric. IFOP (Infinity Fabric over package), as implemented on EPYC "Milan" or the upcoming "Genoa" processors, has shown that its wiring isn't of high enough density that it needs an interposer, and can make do with the fiberglass substrate. Such will be the case with "Navi 31," too. The MCDs will wire out to the GDDR6 memory devices just the way current GPUs do it, so will the IOD, while all the chiplets talk to each other over IFOP.
Sources: Greymon55 (Twitter), VideoCardz
Add your own comment

40 Comments on "Navi 31" RDNA3 Sees AMD Double Down on Chiplets: As Many as 7

#26
Chrispy_
I wonder if AMD will be scaling up the proportional RT performance for RDNA3, or whether they've seen how inefficient RTX is even for Nvidia with their more advanced, 2nd-gen RT hardware.

I've personally bought 4 RTX cards since they were launched and I'm still yet to see a situation where any game's RT effects are worth the performance hit. The ones where RTX are used the most look the best but also suffer the highest performance penalties, and I'd rather just game at higher resolutions and refresh rates. It's not as if RTX reflections or shadows are perfect, they're just a bit more convincing than the baked or screen-space method most games without RTX support use.

Screen-space ambient occlusion looks pretty terrible when cranked up too high, whilst RTX AO looks convincing but is extremely expensive for such a subtle feature. Every other RTX feature I can happily do without.
Posted on Reply
#27
trsttte
watzupkenMy main concern with these chiplet design is heat transfer. If I look at AMD’s Ryzen 3000 and 5000 series CPUs that utilises chiplet designs, these chips tend to run quite hot despite the lower power requirement as compared to Intel’ s monolithic design. While this GCD is expected to be bigger in size than the CPU chiplet, it seems to require more power. The Navi 31 is rumoured to pull 450 to 500W of power. So while the heatsink may increase in size, the contact with the chiplets may not be as ideal to allow good heat transfers. It will be interesting to see how this pans out with RDNA3.
The CPU and GPU market have a big difference where a CPU cooler is designed by a 3rd party to target a vast range of CPUs, GPU coolers are dedicated to each board/chip and much more fine tuned. They also generally do direct die contact instead of using a heat spreader further improving their efficiency at cooling the chip.

I think we'll generally see a bigger use of vapor chambers instead of the cheaper cold plate options and cooling will probably not be a problem for reasonable cards (none of that 400w+ nonsense), but I won't hold my breath on any price reductions (you save some on the MCM chip, but then spend it on the package and interconnect, etc..)
Posted on Reply
#28
Chrispy_
Aren't MCM's easier to cool with heatpipes?

A monolithic die can only make direct contact with 2-3 heatpipes, depending on the size of the die and heatpipes. Additional heatpipes that don't contact the die don't do an awful lot.

MCM designs spread the dies out, meaning that different heatpipes can cover different dies and the result is that in, say, a five heatpipe design, all five heatpipes are being useful and no single die is using silly amounts of power that could overwhelm the specific heatpipe(s) that it's making contact with.

A vapor chamber is the obvious answer but they cost more and in even when paying silly money for GPUs these days, manufacturers don't want to spend any more than they absolutely have to. If a heatpipe is adequate, that is what you're going to get and if you're willing to spend more on cooling they'll instead try to sell you a ridiculous thing with an AIO water loop attached to it because their markup on that is fantastic.
Posted on Reply
#29
trsttte
Chrispy_Aren't MCM's easier to cool with heatpipes?

A monolithic die can only make direct contact with 2-3 heatpipes, depending on the size of the die and heatpipes. Additional heatpipes that don't contact the die don't do an awful lot.

MCM designs spread the dies out, meaning that different heatpipes can cover different dies and the result is that in, say, a five heatpipe design, all five heatpipes are being useful and no single die is using silly amounts of power that could overwhelm the specific heatpipe(s) that it's making contact with.

A vapor chamber is the obvious answer but they cost more and in even when paying silly money for GPUs these days, manufacturers don't want to spend any more than they absolutely have to. If a heatpipe is adequate, that is what you're going to get and if you're willing to spend more on cooling they'll instead try to sell you a ridiculous thing with an AIO water loop attached to it because their markup on that is fantastic.
I think it would depend on the specific layout of each die, they're still reasonably large dies (videocardz, source for this article mentions a combined 800mm2), so I'd bet on the compute dies being parallel to the lenght of the card. Without using a complex shape for the heatpipes it would be hard to have them dedicated to each die.

And then there's also the rumored other dies (they're talking about 7 chiplets total which seems excessive for a first iteration of the technology) so I think the logical solution should be vapor chambers all around
Posted on Reply
#30
MxPhenom 216
ASIC Engineer
watzupkenMy main concern with these chiplet design is heat transfer. If I look at AMD’s Ryzen 3000 and 5000 series CPUs that utilises chiplet designs, these chips tend to run quite hot despite the lower power requirement as compared to Intel’ s monolithic design. While this GCD is expected to be bigger in size than the CPU chiplet, it seems to require more power. The Navi 31 is rumoured to pull 450 to 500W of power. So while the heatsink may increase in size, the contact with the chiplets may not be as ideal to allow good heat transfers. It will be interesting to see how this pans out with RDNA3.


Chiplet design should ideally bring cost down as compared to a monolithic one. So given that Nvidia is also using TSMC 5nm now, I think AMD will have an advantage from a cost perspective (Plus the fact that AMD may likely have a preferential pricing due to their partnership with TSMC). But whether it will be reasonable when it lands in the retail space, I am not so optimistic. TSMC 5nm is not cheap to begin with. AMD’s strategy seems to target the second best nodes for their products to avoid paying a substantial premium for cutting edge nodes that the likes of Apple tend to scoop up. But with all the big players all stuck with 5nm/ 4nm at this point in time, I can imagine it will be very costly to try and secure more allocation. Apple by themselves would have scooped up most of the 4nm allocations with their deep pockets.
Actually, I think chiplet designs are better in regards to heat and managing it. The heat production is now generated by chiplets spread out under an IHS rather than all the heat generated coming from one monolithic die. The heat is more spread out.
Posted on Reply
#31
Mussels
Freshwater Moderator
MxPhenom 216Actually, I think chiplet designs are better in regards to heat and managing it. The heat production is now generated by chiplets spread out under an IHS rather than all the heat generated coming from one monolithic die. The heat is more spread out.
heat density is the issue, see the 5800x.

It's why the intels can overclock and through at 300W, while the 5800x cant be cooled at 150W.
Posted on Reply
#32
Chrispy_
Musselsheat density is the issue, see the 5800x.

It's why the intels can overclock and through at 300W, while the 5800x cant be cooled at 150W.
This is such a busted myth that I simply cannot understand why people keep parroting it.

My 5800X is easily cooled at the 142W stock boost limit. I have one on water and it's a piece of cake, previously it was using a decade-old NH-U12 (effictively the same as the modern redux version) and never broke 80C. The dozens of 5950X machines at work are air-cooled too, and they're basically two 5800X sharing a single cooler....
Posted on Reply
#33
Mussels
Freshwater Moderator
Chrispy_This is such a busted myth that I simply cannot understand why people keep parroting it.

My 5800X is easily cooled at the 142W stock boost limit. I have one on water and it's a piece of cake, previously it was using a decade-old NH-U12 (effictively the same as the modern redux version) and never broke 80C. The dozens of 5950X machines at work are air-cooled too, and they're basically two 5800X sharing a single cooler....
because some of us own the ones that dont cool easily at all.
custom water, lapped, liquid metal... nothings made mine work as you claim it does.
80C? sure, but what clock speed? What type of load?
Fire up AVX load and see if you're even close to the 5.05GHz these chips are capable of, and you're more likely to be around 4.4GHz

The 5950x is not the same, as it shares that wattage over double the surface area.
Posted on Reply
#34
InVasMani
AMD could put a chiplet on the rear of the PCB and have a slim 1 slot blower cooler on that side of the PCB similar to a Quadro P4000, but placed on the reverse side of the GPU. They could then use that kind of akin to big LITTLE for certain GPU tasks. A potential use would be a dedicated chiplet for upscale and other duties encode/decode where the primary duties are on the bottom mounted chiplet with more of a traditional 2 slot cooler for that.
Posted on Reply
#35
TheoneandonlyMrK
InVasManiAMD could put a chiplet on the rear of the PCB and have a slim 1 slot blower cooler on that side of the PCB similar to a Quadro P4000, but placed on the reverse side of the GPU. They could then use that kind of akin to big LITTLE for certain GPU tasks. A potential use would be a dedicated chiplet for upscale and other duties encode/decode where the primary duties are on the bottom mounted chiplet with more of a traditional 2 slot cooler for that.
I think your miss understanding chiplets, on the back, why the back.
Posted on Reply
#36
InVasMani
TheoneandonlyMrKI think your miss understanding chiplets, on the back, why the back.
I'm saying offload the less important stuff on the rear of the PCB to free up a bit of active heat dissipation concerns. There are absolutely things that could be placed on that side and with a cooling solution like described. Take into consideration a approach like Lucid Hydra and what it aimed to do and that similar could be done all within the same GPU PCB and w/o the complications and latency snafu's of the NB complicating any of it.
Posted on Reply
#37
Chrispy_
Mussels80C? sure, but what clock speed? What type of load?
Fire up AVX load and see if you're even close to the 5.05GHz these chips are capable of, and you're more likely to be around 4.4GHz
I already said - stock, 142W.

IIRC it was plateauing in the high-seventies, like 78-79C when fully loaded with a Cinbench r23 for the 10-minute default multi-threaded test with a very old noctua cooler NH-U12 and a bequiet low-RPM fan which maxed out at 1500rpm.

Now on an Alphacool loop with a 140+360 radiator it boosts to 4575MHz at ~65C. Granted, it's a sample size of one, but presumably so is your chip. I also don't use AVX workloads and those always run slower anyway, even on Intel.

The 5950X rendernodes of which I have a much larger sample size are on air and they crunch full load 24/7 at work at about 4.35GHz. That's using an NH-U14S SP3 which results in 90C at ~175W CPU power draw. I set a high TDP for those manually, and then reduced the throttle temperature to 90C so they'll basically operate at that temperature and use whatever power the NH-U14S can cool to 90C.
Posted on Reply
#38
MxPhenom 216
ASIC Engineer
Musselsheat density is the issue, see the 5800x.

It's why the intels can overclock and through at 300W, while the 5800x cant be cooled at 150W.
I mean thats pretty apples to oranges too though. Different chip architectures, process nodes, etc. Compare 5800x without chiplets and one with (if that were even possible) would paint a clearer picture.

But based on your response to the other guy I think we are in agreement.
Posted on Reply
Add your own comment
Dec 18th, 2024 09:38 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts