Sunday, December 19th 2021

AMD Radeon "Navi 3x" Could See 50% Increase in Shaders, Double the Cache Memory

AMD's next generation Radeon "Navi 3x" line of GPUs could see a 50% increase in shaders and a doubling Infinity Cache memory size, according to some educated-guesswork and intelligence by Greymon55, a reliable source with GPU leaks. The Navi 31, Navi 32, and Navi 33 chips are expected to debut the new RDNA3 graphics architecture, and succeed the 6 nm optical-shrinks of existing Navi 2x chips that AMD is rumored to be working on.

The top Navi 31 part allegedly features 60 workgroup processors (WGPs), or 120 compute units. Assuming an RDNA3 CU still holds 64 stream processors, you're looking at 7,680 stream processors, a 50% increase over Navi 21. The Navi 32 silicon features 40 WGPs, and exactly the same number of shaders as the current Navi 21, at 5,120. The smallest of the three, the Navi 33, packs 16 WGPs, or 2,048 shaders. There is a generational doubling in cache memory, with 256 MB on the Navi 31, 192 MB on the Navi 32, and 64 MB on the Navi 33. Interestingly, the memory sizes and bus widths are unchanged, but AMD could leverage faster GDDR6 memory types. 2022 will see the likes of Samsung ship GDDR6 chips with data-rates as high as 24 Gbps.
Source: Greymon55 (Twitter)
Add your own comment

44 Comments on AMD Radeon "Navi 3x" Could See 50% Increase in Shaders, Double the Cache Memory

#26
Vayra86
mastrdrverIf there was some simple way to do it, both nVidia and AMD (ATI) would have done it a long time ago.
No they wouldn't do it. They'd be cannibalizing on their own roadmap and profit margins.

Rather, the strategy is to postpone as many changes as possible to later generations, if the economic reality allows for such a thing. Look at GCN's development - you can conclude there weren't funds for targeted development, or you could say the priority wasn't there because 'AMD still had revenue'... and they still pissed away money. Look at the features that got postponed from Maxwell to Pascal - Nvidia simply didn't have to make a better 970 or 980ti and Maxwell was already a very strong gen 'in the market at the time' - but they had the Pascal technology on shelf already. Similarly, the move from Volta > Turing > Ampere, is a string of massively delayed releases. Its no coincidence these 'delays' happened around the same years for both competitors. Another big factor to stall is the console release roadmap - Nvidia is learning the hard way right now, as they gambled on pre-empting the consoles with their own RTX. In the wild, we now see them use those tensor/RT cores primarily for non-RT workloads like DLSS because devs are primarily console oriented, especially on big budget/multiplatform. So we get lackluster RT implementations on PC.

So no... both companies are and will always be balancing on the edge of what they must do at the bare minimum to keep selling product. They want to leave as much in the tank for later, and rather sell GPUs on 'new features' that are not hardware based. Software for example. Better drivers. Support for new APIs. Monitor technology. Shadowplay. Low Latency modes. New AA modes. Etc etc. None of this requires a new architecture, and there is nothing easier than just refining what you have. Its what Nvidia has been doing for so long now, and what kept them on top. Minor tweaks to architecture to support new tech, at best, and keep pushing the efficiency button.
Posted on Reply
#27
londiste
PunkenjoyBut that would mean the infinity fabric link between the I/O die and the chiplet is huge. right now, on die, AMD state that it's 16 x 64b for NAVI21. it would mean probably at least 12 x 2 x 64b for Navi 31. Not undoable but i wonder how it will be expensive to make with an interposer.
Not only expensive to make - if they go full interposer it probably does not matter all too much how many traces it has - but wide IF is not exactly power efficient.
Posted on Reply
#28
Oberon
Vayra86No they wouldn't do it. They'd be cannibalizing on their own roadmap and profit margins.

Rather, the strategy is to postpone as many changes as possible to later generations, if the economic reality allows for such a thing. Look at GCN's development - you can conclude there weren't funds for targeted development, or you could say the priority wasn't there because 'AMD still had revenue'... and they still pissed away money. Look at the features that got postponed from Maxwell to Pascal - Nvidia simply didn't have to make a better 970 or 980ti and Maxwell was already a very strong gen 'in the market at the time' - but they had the Pascal technology on shelf already. Similarly, the move from Volta > Turing > Ampere, is a string of massively delayed releases. Its no coincidence these 'delays' happened around the same years for both competitors. Another big factor to stall is the console release roadmap - Nvidia is learning the hard way right now, as they gambled on pre-empting the consoles with their own RTX. In the wild, we now see them use those tensor/RT cores primarily for non-RT workloads like DLSS because devs are primarily console oriented, especially on big budget/multiplatform. So we get lackluster RT implementations on PC.

So no... both companies are and will always be balancing on the edge of what they must do at the bare minimum to keep selling product. They want to leave as much in the tank for later, and rather sell GPUs on 'new features' that are not hardware based. Software for example. Better drivers. Support for new APIs. Monitor technology. Shadowplay. Low Latency modes. New AA modes. Etc etc. None of this requires a new architecture, and there is nothing easier than just refining what you have. Its what Nvidia has been doing for so long now, and what kept them on top. Minor tweaks to architecture to support new tech, at best, and keep pushing the efficiency button.
Companies don't sit there and try to trim the fat to decide the bare minimum they can get away with. They design products that are the best they can be within other constraints such as power, die size, and within the allowable time. The trimming and compromising comes with the products further down the stack, which are all derivatives of the top, "halo" product. The reason you end up with delays and evolutionary products instead of constant revolution is because, shockingly, this shit is hard! It takes hundreds of engineers and tens of thousands of engineer-hours to get these products out the door even when the designs are "just" derivatives.

The obvious counterpoint to this would be Intel and their stagnation for the near-decade between the release of Sandy Bridge and the competitive changes that arrived with Ryzen, but even that isn't an example of what you claim. Intel was working in an environment where their more advanced 10 and 7 nm process nodes were MASSIVELY delayed, throwing off their entire design cycle. The result was engineers laboring under and entirely different set of constraints, with one of those being Intel's profit margin, but again, this isn't what you have been describing. It represents a ceiling for cost, but engineers do whatever they can within that constraint. The trimming and compromising comes as you move down the product stack where that same sort of margin must be maintained and you have other competitive concerns than "this is what we thought was possible given the constraints we are under."
londisteNot only expensive to make - if they go full interposer it probably does not matter all too much how many traces it has - but wide IF is not exactly power efficient.
IF is only really expensive in terms of power when being pushed over the substrate. Utilizing an interposer or other technology like EFB (which is what will actually be used) reduces those power requirements tremendously.
Posted on Reply
#29
Vayra86
OberonCompanies don't sit there and try to trim the fat to decide the bare minimum they can get away with. They design products that are the best they can be within other constraints such as power, die size, and within the allowable time. The trimming and compromising comes with the products further down the stack, which are all derivatives of the top, "halo" product. The reason you end up with delays and evolutionary products instead of constant revolution is because, shockingly, this shit is hard! It takes hundreds of engineers and tens of thousands of engineer-hours to get these products out the door even when the designs are "just" derivatives.

The obvious counterpoint to this would be Intel and their stagnation for the near-decade between the release of Sandy Bridge and the competitive changes that arrived with Ryzen, but even that isn't an example of what you claim. Intel was working in an environment where their more advanced 10 and 7 nm process nodes were MASSIVELY delayed, throwing off their entire design cycle. The result was engineers laboring under and entirely different set of constraints, with one of those being Intel's profit margin, but again, this isn't what you have been describing. It represents a ceiling for cost, but engineers do whatever they can within that constraint. The trimming and compromising comes as you move down the product stack where that same sort of margin must be maintained and you have other competitive concerns than "this is what we thought was possible given the constraints we are under."


IF is only really expensive in terms of power when being pushed over the substrate. Utilizing an interposer or other technology like EFB (which is what will actually be used) reduces those power requirements tremendously.
I agree with your points too, dont get me wrong. But our arguments are not mutually exclusive. AMD and Nvidia serve the same market and they look closely at one another. They take an educated/informed guess and risk with the products they launch - as much as engineering and yield matters, marketing, time to market and the potential advance of the competition are factors just as well.

And the trimming certainly happens even on the top of the stack! Even now Nvidia is serving up incomplete GA102 dies in their halo product while enterprise gets the perfect ones. Maxwell, same story - and we know the 980ti was juuuuust a hair ahead of Fury X. Coincidence? Ofc not.

And in terms of delays... you say Intel. I say Nvidia (post-)Pascal. Maxwell had Pascal features cut and delayed... then Turing took its sweet time and barely moved price/perf forward... while AMD was still rebranding GCN and later formulating an answer to 1080ti performance. Coincidence?! ;)
Posted on Reply
#30
mastrdrver
ARFThe Radeon RX 6900 XT is more than double the performance of RX 5700 XT. 201% vs 100%.


AMD Radeon RX 5700 XT Specs | TechPowerUp GPU Database
That's not relative to each other and one two different scales. See the note at the bottom.

I still stand by what I said. If the improvement is more then 50% at 4K from a 6900 XT to a 7900 XT, I'll be shocked (and so will almost everyone else).

Like I said earlier, I don't know and neither does any one else. It would be highly unlikely.
Posted on Reply
#31
ARF
mastrdrverThat's not relative to each other and one two different scales.
It is. There are no different scales.
Posted on Reply
#32
Vayra86
mastrdrverThat's not relative to each other and one two different scales. See the note at the bottom.

I still stand by what I said. If the improvement is more then 50% at 4K from a 6900 XT to a 7900 XT, I'll be shocked (and so will almost everyone else).

Like I said earlier, I don't know and neither does any one else. It would be highly unlikely.
But why would you be comparing relative perf between a 5700XT and a 6900XT? Both are positioned on the other end of the product stack. No shit sherlock, the difference is bigger between midrange SKU from last year and top end from the first real, full RDNA stack :D Its also certainly 100% between the two. The scale there is the relative scale, simple.

This comparison makes absolutely no sense and has zero relation to the discussion of per-gen improvements. Rather, compare to the same tier GPU like a 6700 XT... and there's your 27%.

As for your general statement you're absolutely correct, and if 6900XT > 7900XT is >50% I'll eat a virtual shoe.
Posted on Reply
#33
ARF
Vayra86But why would you be comparing relative perf between a 5700XT and a 6900XT? Both are positioned on the other end of the product stack. No shit sherlock, the difference is bigger between midrange SKU from last year and top end from the first real, full RDNA stack :D Its also certainly 100% between the two. The scale there is the relative scale, simple.

This comparison makes absolutely no sense and has zero relation to the discussion of per-gen improvements. Rather, compare to the same tier GPU like a 6700 XT... and there's your 27%.

As for your general statement you're absolutely correct, and if 6900XT > 7900XT is >50% I'll eat a virtual shoe.
RX 6700 XT is on the same old N7 TSMC node as RX 5700 XT, that is why you don't see the normal generational improvement from the die shrink.
It is 251 sq. mm vs 335 sq. mm.

You have to compare either old 250 sq. mm N7 Navi 10 to new 250 sq. mm N5 Navi 33,
or 335 sq. mm old N7 Navi 22 to a new 335 sq. mm N5 Navi 32.
Posted on Reply
#34
mastrdrver
Vayra86But why would you be comparing relative perf between a 5700XT and a 6900XT? Both are positioned on the other end of the product stack. No shit sherlock, the difference is bigger between midrange SKU from last year and top end from the first real, full RDNA stack :D Its also certainly 100% between the two. The scale there is the relative scale, simple.

This comparison makes absolutely no sense and has zero relation to the discussion of per-gen improvements. Rather, compare to the same tier GPU like a 6700 XT... and there's your 27%.

As for your general statement you're absolutely correct, and if 6900XT > 7900XT is >50% I'll eat a virtual shoe.
I was using it to compare generational improvements as the 6900 XT is a doubling of the 5700 XT (except for the bus) when looking at the cores, TMUs, and ROPs. Outside of the clock speed difference, you're looking at the changes in architecture. Though I understand your point of the 6700 XT, but I didn't think they were out yet because the prices of GPUs are still at absurd levels I just have not been paying attention any more.

Oddly enough, I was looking through price rumors were for RDNA2 to see how close they were to reality to see what the odds of the RDNA3 rumors on prices being correct are.

What are the odds that AMD will pull an nVidia and show case the cards at 4k and say "see 2x improvement!"? Probably pretty good.
Posted on Reply
#35
Xaled
mechtechAnd optimized for mining

aka the legal way to get your own money printing press
except money printing press are much more innocent, at least they don't burn electricity 24/7
Posted on Reply
#36
TheoneandonlyMrK
Xaledexcept money printing press are much more innocent, at least they don't burn electricity 24/7
Yes because the mice turning the wheel running the machine eat cheese not elecy?!, we are not in the 1800s I'm sure printers Do use elecy now.

I mean when you aren't printing money ,the machine is off and not using power but start printing money and it will use power ,no?!.


Where can I get a money printer anyway?!.

I need one to buy my next GPU anyway, clearly.
Posted on Reply
#37
Speedyblupi
btarunrThe top Navi 31 part allegedly features 60 workgroup processors (WGPs), or 120 compute units. Assuming an RDNA3 CU still holds 64 stream processors, you're looking at 7,680 stream processors, a 50% increase over Navi 21.
The majority of leakers have said that the number of shaders per WGP is being doubled. It's 30WGPs, 120 CUs, or 7680SPs for a single Navi 31 die, and 60 WGPs, 240 CUs, or 15360SPs for the dual-die module.
mastrdrverEven when you had 5870 (which was a doubling of 4870) you didn't even see a 100% increase over the previous generation. You only saw (at best) a 50% increase in performance but that was at 2560x1600 with was on a $1,000 USD monitor that very few had. 40% increase at 1080p was the reality of that card
mastrdrver6900 XT is about 40% faster then 5700 XT.
That's not how percentages work.

100% is not 40% more than 60%, it's 66% more. The HD 4870 is 40% slower than the HD 5870, which means the HD 5870 is 66% faster than the HD 4870. Scaling isn't perfect of course, but much of the reason is because it's limited by memory bandwidth. If the HD 5870 had twice as much bandwidth than the HD 4870 rather than only 30% more, it would be closer to 80-90% faster. Navi 31 might have a similar issue to an extent, but the much larger infinity cache can make up for at least part of the bandwidth deficit, and Samsung's got new 24Gbps GDDR6 chips (50% faster than on the 6900 XT) .
Vayra86But why would you be comparing relative perf between a 5700XT and a 6900XT? Both are positioned on the other end of the product stack.
Because AMD is doing the exact same thing again. The top dual-die Navi 31 card will be several tiers higher than the RX 6900 XT, just as the RX 6900 XT was several tiers higher than the RX 5700 XT.
Vayra86As for your general statement you're absolutely correct, and if 6900XT > 7900XT is >50% I'll eat a virtual shoe.
I'll eat a virtual shoe if it isn't. AMD would have to be very stingy with their product segmentation for that to happen, for example using a single Navi 32 die or a heavily cut-down Navi 31.

And if the top RDNA 3 card (which might be called "RX 7950 XT", "RX 7900 X2", or possibly a completely new name similar to Nvidia's Titan series) isn't >50% faster than the 6900 XT, I'll eat a literal shoe. I expect it to be well over 100% faster, though 150% faster is debatable.

Honestly, I don't understand you guys. AMD is going to approximately double the die area (by using two dies of approximately the same size as Navi 21) while shrinking to N5. How is it not going to double performance? Why is this even a question? The top RDNA3 card is likely to have an MSRP of $2500, possibly even higher, but even if you're talking about performance at the same price point, 50% better is not unrealistic at all.
Posted on Reply
#38
Vayra86
SpeedyblupiThe majority of leakers have said that the number of shaders per WGP is being doubled. It's 30WGPs, 120 CUs, or 7680SPs for a single Navi 31 die, and 60 WGPs, 240 CUs, or 15360SPs for the dual-die module.



That's not how percentages work.

100% is not 40% more than 60%, it's 66% more. The HD 4870 is 40% slower than the HD 5870, which means the HD 5870 is 66% faster than the HD 4870. Scaling isn't perfect of course, but much of the reason is because it's limited by memory bandwidth. If the HD 5870 had twice as much bandwidth than the HD 4870 rather than only 30% more, it would be closer to 80-90% faster. Navi 31 might have a similar issue to an extent, but the much larger infinity cache can make up for at least part of the bandwidth deficit, and Samsung's got new 24Gbps GDDR6 chips (50% faster than on the 6900 XT) .


Because AMD is doing the exact same thing again. The top dual-die Navi 31 card will be several tiers higher than the RX 6900 XT, just as the RX 6900 XT was several tiers higher than the RX 5700 XT.


I'll eat a virtual shoe if it isn't. AMD would have to be very stingy with their product segmentation for that to happen, for example using a single Navi 32 die or a heavily cut-down Navi 31.

And if the top RDNA 3 card (which might be called "RX 7950 XT", "RX 7900 X2", or possibly a completely new name similar to Nvidia's Titan series) isn't >50% faster than the 6900 XT, I'll eat a literal shoe. I expect it to be well over 100% faster, though 150% faster is debatable.

Honestly, I don't understand you guys. AMD is going to approximately double the die area (by using two dies of approximately the same size as Navi 21) while shrinking to N5. How is it not going to double performance? Why is this even a question? The top RDNA3 card is likely to have an MSRP of $2500, possibly even higher, but even if you're talking about performance at the same price point, 50% better is not unrealistic at all.
Okay. Lets see if 20 years worth of gpu history gets changed with AMDs next magical gen :D

Its not like we havent been at that notion before lmao

Be careful with the mix up of 'same price point' versus same tier. An important difference. Last time we saw +50% is when Nvidia introduced a price hike in the Pascal line up. As for unobtanium 2500 dollar GPUs, those are of zero relevance for a 'normal' consumer gaming stack. As are those of 1K.
Posted on Reply
#39
Punkenjoy
I think the key there is before AMD and Nvidia tought the celling for GPU was around 500-700$. They now see they can sell a lot of GPU at 2500$.

I supect they will both just design GPU made to be sell at those price without the current markup. This way they can continue their performance wars. We will see if it will lead to increase performance at lower end of the price range...
Posted on Reply
#40
Vayra86
PunkenjoyI think the key there is before AMD and Nvidia tought the celling for GPU was around 500-700$. They now see they can sell a lot of GPU at 2500$.

I supect they will both just design GPU made to be sell at those price without the current markup. This way they can continue their performance wars. We will see if it will lead to increase performance at lower end of the price range...
Crypto and a pandemic are selling higher price GPUs now. Gamers are generally left out in the cold here.
Posted on Reply
#41
mama
Vayra86No they wouldn't do it. They'd be cannibalizing on their own roadmap and profit margins.

Rather, the strategy is to postpone as many changes as possible to later generations, if the economic reality allows for such a thing. Look at GCN's development - you can conclude there weren't funds for targeted development, or you could say the priority wasn't there because 'AMD still had revenue'... and they still pissed away money. Look at the features that got postponed from Maxwell to Pascal - Nvidia simply didn't have to make a better 970 or 980ti and Maxwell was already a very strong gen 'in the market at the time' - but they had the Pascal technology on shelf already. Similarly, the move from Volta > Turing > Ampere, is a string of massively delayed releases. Its no coincidence these 'delays' happened around the same years for both competitors. Another big factor to stall is the console release roadmap - Nvidia is learning the hard way right now, as they gambled on pre-empting the consoles with their own RTX. In the wild, we now see them use those tensor/RT cores primarily for non-RT workloads like DLSS because devs are primarily console oriented, especially on big budget/multiplatform. So we get lackluster RT implementations on PC.

So no... both companies are and will always be balancing on the edge of what they must do at the bare minimum to keep selling product. They want to leave as much in the tank for later, and rather sell GPUs on 'new features' that are not hardware based. Software for example. Better drivers. Support for new APIs. Monitor technology. Shadowplay. Low Latency modes. New AA modes. Etc etc. None of this requires a new architecture, and there is nothing easier than just refining what you have. Its what Nvidia has been doing for so long now, and what kept them on top. Minor tweaks to architecture to support new tech, at best, and keep pushing the efficiency button.
I respectfully disagree. AMD is coming from a different place. Their market share in the GPU space is small compared to the behemoth that is Nvidia. I suspect they know that to shake things up they need to produce a clearly superior product. They will be pulling out all stops in my view.
Posted on Reply
#42
TheoneandonlyMrK
Vayra86Crypto and a pandemic are selling higher price GPUs now. Gamers are generally left out in the cold here.
I would say I could always buy a GPU , but only bottom wrung or top with anything in the middle being priced at top in shops, but I could always get one, my thinking being, right, I'll pay more ,But I want more than what's offered in performance, like next generation hopefully.
Posted on Reply
#43
Vayra86
mamaI respectfully disagree. AMD is coming from a different place. Their market share in the GPU space is small compared to the behemoth that is Nvidia. I suspect they know that to shake things up they need to produce a clearly superior product. They will be pulling out all stops in my view.
Maybe they will now? Wait and see mode. We have already seen AMD doesnt care enough about market share to start being serious about iGPU in Ryzen mobile or its APUs... we have also seen them rebrand GCN alot, and RDNA2 isnt exactly attempting to win market share either with competitive pricing OR a lot of them in stock. Nvidia still sells more GPUs just because it moves more units into the market. AMD still has a less effective sales pitch apparently or makes fewer GPUs altogether.

So... it would be a first to see AMD compete aggressively to steal market share from Nv. So far I havent seen the slightest move in that direction since post-Hawaii.
Posted on Reply
#44
Punkenjoy
Vayra86Crypto and a pandemic are selling higher price GPUs now. Gamers are generally left out in the cold here.
There are huge amount of GPU in the hands of non-gamers, but there are many gamers that had their hands on a 3090 or in a lesser manner 6900xt even at over inflated price.

Most gamers will probably not spend 2k+ on a graphic cards, but i am sure that there are many that would pay that if not more just to get the best of the best. I don't know exactly how many of those people there are, but i suspect there are now enough for making cards for that market specifically.

That do not change the fact that cards right now are over inflated, but that won't last forever and when this will be over, there will still be plenty of people buying 2K+ cards.

And in the end, it wouldn't really matter if the top card is now sell at those price if at a reasonable price point like 250$ we had real performance gain.

I recall hearing that Nvidia was surprised on how many Titan RTX were in the hands of gamers. They made the 3090 and gamer still ask for more. So why not pleasing the people with way too much money (Or way not enough other Hobbies than PC gaming....)
Posted on Reply
Add your own comment
Dec 23rd, 2024 00:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts