Monday, August 14th 2023

AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

Aug 14th, 2023 22:40 Discuss (33 Comments)

"Navi 4C" is a future high-end GPU from AMD that will likely not see the light of day, as the company is pivoting away from the high-end GPU segment with its next RDNA4 generation. For AMD to continue investing in the development of this GPU, the gaming graphics card segment should have posted better sales, especially in the high-end, which it didn't. Moore's Law is Dead scored details of what could have been a fascinating technological endeavor for AMD, in building a highly disaggregated GPU.

AMD's current "Navi 31" GPU sees a disaggregation of the main logic components of the GPU that benefit from the latest 5 nm foundry node to be located in a central Graphics Compute Die; surrounded by up to six little chiplets built on the older 6 nm foundry node, which contain segments of the GPU's Infinity Cache memory, and its memory interface—hence the name memory cache die. With "Navi 4C," AMD had intended to further disaggregate the GPU, identifying even more components on the GCD that can be spun out into chiplets; as well as breaking up the shader engines themselves into smaller self-contained chiplets (smaller dies == greater yields and lower foundry costs).

The way AMD would go about creating "Navi 4C" would be using a vast array of packaging innovations that ensure the numerous kinds of chiplets talk to each other with as little latency as possible, as if they were parts of a whole monolithic die.

Assuming AMD had continued to use GDDR6 and not the newer GDDR7 memory standard, the company would have likely retained 6 nm MCDs from the current generation, to provide video memory interface and last-level cache to the GPU, minimizing R&D costs, and gaining from the further reduced foundry costs for the 6 nm node.

AMD identified the GPU's media acceleration engine, and Radiance Display Engine ripe for the next round of disaggregation. Although media acceleration engines are logic components, these are fixed function hardware, as are the display engines; and can likely make do with older foundry nodes. The media acceleration and display engine would be spun off into a separate chiplet called MID (media and I/O die). At this point we don't know if AMD would've gone with 6 nm or a newer node for the MID, but given that the company is able to pack the latest media and display I/O features onto the 6 nm "Navi 33" monolithic silicon, it's possible that the company would go with the older node.

Much of the semiconductor engineering muscle is centered on what happens to the most critical number-crunching machinery of the GPU, the Shader Engines. Apparently, AMD figured out that each Shader Engine, consisting of a fixed number of workgroup processors (WGPs), could be spun out into chiplets, called SEDs (shader engine dies). These would be built on an advanced foundry node. Given that NVIDIA is building its next-gen "Blackwell" GPUs on 3 nm, it's quite possible that AMD uses the same node for the SEDs.

The SEDs are seated on active interposer dies (AIDs). An interposer in general, is a silicon die whose sole purpose is to facilitate high-density microscopic wiring between chiplets stacked on top of it, with wiring densities that otherwise wouldn't be possible through fiberglass substrate. The "active" part of the AID refers to the ability of the interposer not just to facilitate wiring among dies stacked on top and to the substrate below, but also neighboring AIDs. For this purpose, TSMC innovated the COW-L (chip-on-wafer-L) bridges.

These are tiny silicon dies designed for inter-AID high-density wiring, and is how a mesh of AIDs talk to each other, and to the MID. As for how they communicate with the MCDs, remains to be seen. The current generation of MCDs are connected with the GCD using Infinity Fan-out Links—a high-density wiring method that makes do with the fiberglass substrate as the medium, instead of silicon. Should AMD be using another method to connect the MCDs, it would mean that the company is using a newer generation of them. Besides COW-L bridges, AMD is also leveraging TSMC's COW-V TSVs (through-silicon via) innovations for connecting the SEDs to the package substrate (for power and other I/O).

Alas, it's highly unlikely that "Navi 4C" will ever get off the drawing board. AMD has already implemented many of these packaging innovations with its latest MI300 compute processor based on the CDNA3 architecture, and it would have been incredible to see them in the gaming graphics segment, however, basic economics prevent AMD from investing in further development of the "Navi 4C." The gaming graphics card market is in its biggest slump since the late 2010s, and the enthusiast-class GPU caters to a niche market.

The current market conditions are a far cry from 2021, when the crypto-currency mining gold-rush had incentivized GPU manufacturers to make bigger GPUs. AMD is rumored to have mistimed the launch of its RX 6950 XT GPU toward the tail-end of the crypto boom. By that point, "Navi 31" had reached an advanced level of development and was ready to enter mass-production. The company now probably finds itself unable to justify the cost of development for "Navi 4C" beyond the concepts in this article, unless the market undergoes another dramatic upsurge in demand at the high-end.

Sources: Moore's Law is Dead (YouTube), VideoCardz

Add your own comment

33 Comments on AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

#26

Dr. Dro

Minus Infinity"Navi 4C" is a future high-end GPU from AMD that will likely not see the light of day, as the company is pivoting away from the high-end GPU segment with its next RDNA4 generation."

This does not mean in way AMD is abandoning the high end. RDNA4 is a much more complex design than RDNA3 and there were going to be over 2x the number of chiplets. SOurces inside AMD have said they were struggling to get the design to work and performance would be only minimally improved over RDNA3. Rather than eat tons of resources trying to get it to work which would be delays not just on RDNA4 but also RDNA5 they are just sticking to the lower end monolithic N33/34 designs for RDNA4 which are progressing well and will see huge uplifts in RTing. The high end will just shift to RDNA5. Given Blackewell is not out until 2025 AMD is not going to be that disadvantaged by being without high RDNA4. RDNA5 might be out late 2025 say 6 months or so after Blackwell and in the long run that will mean they have far stronger competitors to high end Blackwell. IMO as long as they can get N43's 8600 level card to be a much stronger offering and more like a 7700XT in raster but with much stronger RT and hardware accelerated FSR3 they sell a ton. Being on 3nm they could pack in a lot more CU's, give it 192bit bus, GDDR7, 12GB for say $299 that would slay upcoming 7700XT.

If it's minimally improved over RDNA 3, that means they still can't beat the 4090. But there is no way RDNA 5th gen is coming out in early 2025, that's less than 2 years from now.

#27

THU31

Xex360There was a shift in GPU segmentation since Turing, the 2080ti should've been the 80 card, an hypothetical 3090ti super the 80 card for ampere, and the 4090 the 80 for Ada. What used to be High end is now ultra enthusiast.

What do you mean?

Ever since Kepler, x80 cards have mainly been using small x04 dies, between 300-400 mm2. The 2080 actually used a big 104 die that was 545 mm2 (because they added RT and tensor cores).

The 3080 used a slightly cut down 102 die, which is what made it such a great value card, one of the best in NVIDIA's history.

And there's no way the 4090 should be the 4080. But at $1200, the 4080 should be using a cut down 102 die. The 103 die should be the 104 die in the 4070/Ti. And the 104 die should be the 106 die for the 4060/Ti.

What changed with Turing is that they started releasing the 102 die right away, instead of a year later. But it's still a huge die with crazy performance. The shift actually happened with Kepler, because that's when they started using a mid-range chip in a high-end product. That was after the failure of the GTX 280 and 480, which used huge and power hungry chips with poor performance.

#28

Minus Infinity

Dr. DroIf it's minimally improved over RDNA 3, that means they still can't beat the 4090. But there is no way RDNA 5th gen is coming out in early 2025, that's less than 2 years from now.

Where did I say it's coming out in early 2025. I said Blackwell is out in 2025 and RDNA5 might be 6 months behind, meaning it's at least late 2025 at best.

#29

Unregistered

THU31What do you mean?

Ever since Kepler, x80 cards have mainly been using small x04 dies, between 300-400 mm2. The 2080 actually used a big 104 die that was 545 mm2 (because they added RT and tensor cores).

The 3080 used a slightly cut down 102 die, which is what made it such a great value card, one of the best in NVIDIA's history.

And there's no way the 4090 should be the 4080. But at $1200, the 4080 should be using a cut down 102 die. The 103 die should be the 104 die in the 4070/Ti. And the 104 die should be the 106 die for the 4060/Ti.

What changed with Turing is that they started releasing the 102 die right away, instead of a year later. But it's still a huge die with crazy performance. The shift actually happened with Kepler, because that's when they started using a mid-range chip in a high-end product. That was after the failure of the GTX 280 and 480, which used huge and power hungry chips with poor performance.

Performance wise, not how big the die itself. The 2080 should've been as powerful as the 2080ti to justify its 80 name, using the hypothetical card the 3080 won't be so great nor the 4090.

#30

THU31

Xex360Performance wise, not how big the die itself. The 2080 should've been as powerful as the 2080ti to justify its 80 name, using the hypothetical card the 3080 won't be so great nor the 4090.

I could agree and disagree at the same time.

2080 - 9% faster than 1080 Ti
1080 - 31% faster than 980 Ti
980 - 11% faster than 780 Ti
680 - 23% faster than 580

So yes, the improvement over the previous flagship was the smallest, but it had to happen because of RTX. They had to sacrifice raster performance improvement to include RT, and that's the generation they decided to do it.
It was kind of a failure, because there were pretty much no games with RT for a long time, DLSS 1 was garbage, and there were some failing memory problems if I remember correctly.

But now that RT is already here, performance improvements are pretty much back to what they were in the previous decade. Unfortunately they also come with a price increase, but that's another topic.

#31

Unregistered

THU31I could agree and disagree at the same time.

2080 - 9% faster than 1080 Ti
1080 - 31% faster than 980 Ti
980 - 11% faster than 780 Ti
680 - 23% faster than 580

So yes, the improvement over the previous flagship was the smallest, but it had to happen because of RTX. They had to sacrifice raster performance improvement to include RT, and that's the generation they decided to do it.
It was kind of a failure, because there were pretty much no games with RT for a long time, DLSS 1 was garbage, and there were some failing memory problems if I remember correctly.

But now that RT is already here, performance improvements are pretty much back to what they were in the previous decade. Unfortunately they also come with a price increase, but that's another topic.

Here lies the problem, we maybe get 80 series performance but at Titan prices.

I believe they just increased margins, and with demand from IA it doesn't make sense to waste silicon on gaming GPUs.

#32

ViperXZ

Xex360I believe they just increased margins

Sure they did. First of all 3090 and other xx90 products are meant to replace Titan, with a few features missing but a lower price point than TITAN RTX. Then covid happened and they saw what enthusiasts are willing to pay for top products. The 3080 was only that cheap because of the 6800 XT, don't think for a second they would've priced their big chip (albeit cut down a bit) that cheap otherwise.

#33

ARF

New renders. This is how Navi 4C was supposed to look like. Extremely difficult/complex to make possible and actually working.

videocardz.com/newz/amd-navi-4x-concept-renders-demonstrate-the-complexity-of-the-high-end-rdna4-gpu-design

Add your own comment

AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

33 Comments on AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

Related News

33 Comments on AMD "Navi 4C" GPU Detailed: Shader Engines are their own Chiplets

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts