Monday, August 14th 2023
AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets
"Navi 4C" is a future high-end GPU from AMD that will likely not see the light of day, as the company is pivoting away from the high-end GPU segment with its next RDNA4 generation. For AMD to continue investing in the development of this GPU, the gaming graphics card segment should have posted better sales, especially in the high-end, which it didn't. Moore's Law is Dead scored details of what could have been a fascinating technological endeavor for AMD, in building a highly disaggregated GPU.
AMD's current "Navi 31" GPU sees a disaggregation of the main logic components of the GPU that benefit from the latest 5 nm foundry node to be located in a central Graphics Compute Die; surrounded by up to six little chiplets built on the older 6 nm foundry node, which contain segments of the GPU's Infinity Cache memory, and its memory interface—hence the name memory cache die. With "Navi 4C," AMD had intended to further disaggregate the GPU, identifying even more components on the GCD that can be spun out into chiplets; as well as breaking up the shader engines themselves into smaller self-contained chiplets (smaller dies == greater yields and lower foundry costs).The way AMD would go about creating "Navi 4C" would be using a vast array of packaging innovations that ensure the numerous kinds of chiplets talk to each other with as little latency as possible, as if they were parts of a whole monolithic die.
Assuming AMD had continued to use GDDR6 and not the newer GDDR7 memory standard, the company would have likely retained 6 nm MCDs from the current generation, to provide video memory interface and last-level cache to the GPU, minimizing R&D costs, and gaining from the further reduced foundry costs for the 6 nm node.
AMD identified the GPU's media acceleration engine, and Radiance Display Engine ripe for the next round of disaggregation. Although media acceleration engines are logic components, these are fixed function hardware, as are the display engines; and can likely make do with older foundry nodes. The media acceleration and display engine would be spun off into a separate chiplet called MID (media and I/O die). At this point we don't know if AMD would've gone with 6 nm or a newer node for the MID, but given that the company is able to pack the latest media and display I/O features onto the 6 nm "Navi 33" monolithic silicon, it's possible that the company would go with the older node.
Much of the semiconductor engineering muscle is centered on what happens to the most critical number-crunching machinery of the GPU, the Shader Engines. Apparently, AMD figured out that each Shader Engine, consisting of a fixed number of workgroup processors (WGPs), could be spun out into chiplets, called SEDs (shader engine dies). These would be built on an advanced foundry node. Given that NVIDIA is building its next-gen "Blackwell" GPUs on 3 nm, it's quite possible that AMD uses the same node for the SEDs.
The SEDs are seated on active interposer dies (AIDs). An interposer in general, is a silicon die whose sole purpose is to facilitate high-density microscopic wiring between chiplets stacked on top of it, with wiring densities that otherwise wouldn't be possible through fiberglass substrate. The "active" part of the AID refers to the ability of the interposer not just to facilitate wiring among dies stacked on top and to the substrate below, but also neighboring AIDs. For this purpose, TSMC innovated the COW-L (chip-on-wafer-L) bridges.
These are tiny silicon dies designed for inter-AID high-density wiring, and is how a mesh of AIDs talk to each other, and to the MID. As for how they communicate with the MCDs, remains to be seen. The current generation of MCDs are connected with the GCD using Infinity Fan-out Links—a high-density wiring method that makes do with the fiberglass substrate as the medium, instead of silicon. Should AMD be using another method to connect the MCDs, it would mean that the company is using a newer generation of them. Besides COW-L bridges, AMD is also leveraging TSMC's COW-V TSVs (through-silicon via) innovations for connecting the SEDs to the package substrate (for power and other I/O).
Alas, it's highly unlikely that "Navi 4C" will ever get off the drawing board. AMD has already implemented many of these packaging innovations with its latest MI300 compute processor based on the CDNA3 architecture, and it would have been incredible to see them in the gaming graphics segment, however, basic economics prevent AMD from investing in further development of the "Navi 4C." The gaming graphics card market is in its biggest slump since the late 2010s, and the enthusiast-class GPU caters to a niche market.
The current market conditions are a far cry from 2021, when the crypto-currency mining gold-rush had incentivized GPU manufacturers to make bigger GPUs. AMD is rumored to have mistimed the launch of its RX 6950 XT GPU toward the tail-end of the crypto boom. By that point, "Navi 31" had reached an advanced level of development and was ready to enter mass-production. The company now probably finds itself unable to justify the cost of development for "Navi 4C" beyond the concepts in this article, unless the market undergoes another dramatic upsurge in demand at the high-end.
Sources:
Moore's Law is Dead (YouTube), VideoCardz
AMD's current "Navi 31" GPU sees a disaggregation of the main logic components of the GPU that benefit from the latest 5 nm foundry node to be located in a central Graphics Compute Die; surrounded by up to six little chiplets built on the older 6 nm foundry node, which contain segments of the GPU's Infinity Cache memory, and its memory interface—hence the name memory cache die. With "Navi 4C," AMD had intended to further disaggregate the GPU, identifying even more components on the GCD that can be spun out into chiplets; as well as breaking up the shader engines themselves into smaller self-contained chiplets (smaller dies == greater yields and lower foundry costs).The way AMD would go about creating "Navi 4C" would be using a vast array of packaging innovations that ensure the numerous kinds of chiplets talk to each other with as little latency as possible, as if they were parts of a whole monolithic die.
Assuming AMD had continued to use GDDR6 and not the newer GDDR7 memory standard, the company would have likely retained 6 nm MCDs from the current generation, to provide video memory interface and last-level cache to the GPU, minimizing R&D costs, and gaining from the further reduced foundry costs for the 6 nm node.
AMD identified the GPU's media acceleration engine, and Radiance Display Engine ripe for the next round of disaggregation. Although media acceleration engines are logic components, these are fixed function hardware, as are the display engines; and can likely make do with older foundry nodes. The media acceleration and display engine would be spun off into a separate chiplet called MID (media and I/O die). At this point we don't know if AMD would've gone with 6 nm or a newer node for the MID, but given that the company is able to pack the latest media and display I/O features onto the 6 nm "Navi 33" monolithic silicon, it's possible that the company would go with the older node.
Much of the semiconductor engineering muscle is centered on what happens to the most critical number-crunching machinery of the GPU, the Shader Engines. Apparently, AMD figured out that each Shader Engine, consisting of a fixed number of workgroup processors (WGPs), could be spun out into chiplets, called SEDs (shader engine dies). These would be built on an advanced foundry node. Given that NVIDIA is building its next-gen "Blackwell" GPUs on 3 nm, it's quite possible that AMD uses the same node for the SEDs.
The SEDs are seated on active interposer dies (AIDs). An interposer in general, is a silicon die whose sole purpose is to facilitate high-density microscopic wiring between chiplets stacked on top of it, with wiring densities that otherwise wouldn't be possible through fiberglass substrate. The "active" part of the AID refers to the ability of the interposer not just to facilitate wiring among dies stacked on top and to the substrate below, but also neighboring AIDs. For this purpose, TSMC innovated the COW-L (chip-on-wafer-L) bridges.
These are tiny silicon dies designed for inter-AID high-density wiring, and is how a mesh of AIDs talk to each other, and to the MID. As for how they communicate with the MCDs, remains to be seen. The current generation of MCDs are connected with the GCD using Infinity Fan-out Links—a high-density wiring method that makes do with the fiberglass substrate as the medium, instead of silicon. Should AMD be using another method to connect the MCDs, it would mean that the company is using a newer generation of them. Besides COW-L bridges, AMD is also leveraging TSMC's COW-V TSVs (through-silicon via) innovations for connecting the SEDs to the package substrate (for power and other I/O).
Alas, it's highly unlikely that "Navi 4C" will ever get off the drawing board. AMD has already implemented many of these packaging innovations with its latest MI300 compute processor based on the CDNA3 architecture, and it would have been incredible to see them in the gaming graphics segment, however, basic economics prevent AMD from investing in further development of the "Navi 4C." The gaming graphics card market is in its biggest slump since the late 2010s, and the enthusiast-class GPU caters to a niche market.
The current market conditions are a far cry from 2021, when the crypto-currency mining gold-rush had incentivized GPU manufacturers to make bigger GPUs. AMD is rumored to have mistimed the launch of its RX 6950 XT GPU toward the tail-end of the crypto boom. By that point, "Navi 31" had reached an advanced level of development and was ready to enter mass-production. The company now probably finds itself unable to justify the cost of development for "Navi 4C" beyond the concepts in this article, unless the market undergoes another dramatic upsurge in demand at the high-end.
33 Comments on AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets
An interesting approach if true, fascinating even, but after being let down oh so many times, I just cannot take any information underpinned by this person seriously.
Or it can be a red herring and AMD will surprise us by pushing the limits even on a slow year, in the hopes of catching Nvidia off-guard even temporarily.
We'll just have to wait and see.
On the other hand, the focus is now AI (or was for nVidia since Turing) gamers are being sold what's left with a slight discount.
If AMD cancelled big Navi4 then I hope its because they want to move forward big Navi5 and actually compete with Nvidia again...
Intel, please do more before these companies kill PC gaming for their Ai greed.
AI will have real meaning when "computer" will do stuff on its own without human interfering.
Same as nvidia does with dlss. We use our AI for dlss. Yeah, right. There is no AI in this.
Plus for not releasing something high-end, I would wonder if AMD is just going to do something highend that is kinda out of band with their primary product line again. Fury 2015 60CU, Vega 2017 64CU, VII 2019 60CU.
This does not mean in way AMD is abandoning the high end. RDNA4 is a much more complex design than RDNA3 and there were going to be over 2x the number of chiplets. SOurces inside AMD have said they were struggling to get the design to work and performance would be only minimally improved over RDNA3. Rather than eat tons of resources trying to get it to work which would be delays not just on RDNA4 but also RDNA5 they are just sticking to the lower end monolithic N33/34 designs for RDNA4 which are progressing well and will see huge uplifts in RTing. The high end will just shift to RDNA5. Given Blackewell is not out until 2025 AMD is not going to be that disadvantaged by being without high RDNA4. RDNA5 might be out late 2025 say 6 months or so after Blackwell and in the long run that will mean they have far stronger competitors to high end Blackwell. IMO as long as they can get N43's 8600 level card to be a much stronger offering and more like a 7700XT in raster but with much stronger RT and hardware accelerated FSR3 they sell a ton. Being on 3nm they could pack in a lot more CU's, give it 192bit bus, GDDR7, 12GB for say $299 that would slay upcoming 7700XT.
I'm pretty sure all the next GPU generations from all makers, Nvidia, AMD, Intel will be marketed as "designed for AI acceleration".
Of course, there will be products that indeed are focussed on that, but I think GPU makers are betting on an AI craze similar to cryptomining craze, where buyers will try to outbid one another in attempt to buy as much "gaming" cards as possible, for smaller, home AI generators...