Monday, May 13th 2024
AMD RDNA 5 a "Clean Sheet" Graphics Architecture, RDNA 4 Merely Corrects a Bug Over RDNA 3
AMD's future RDNA 5 graphics architecture will bear a "clean sheet" design, and may probably not even have the RDNA branding, says WJM47196, a source of AMD leaks on ChipHell. Two generations ahead of the current RDNA 3 architecture powering the Radeon RX 7000 series discrete GPUs, RDNA 5 could see AMD reimagine the GPU and its key components, much in the same way RDNA did over the former "Vega" architecture, bringing in a significant performance/watt jump, which AMD could build upon with its successful RDNA 2 powered Radeon RX 6000 series.
Performance per Watt is the biggest metric on which a generation of GPUs can be assessed, and analysts believe that RDNA 3 missed the mark with generational gains in performance/watt despite the switch to the advanced 5 nm EUV process from the 7 nm DUV. AMD's decision to disaggregate the GPU, with some of its components being built on the older 6 nm node may have also impacted the performance/watt curve. The leaker also makes a sensational claim that "Navi 31" was originally supposed to feature 192 MB of Infinity Cache, which would have meant 32 MB segments of it per memory cache die (MCD). The company instead went with 16 MB per MCD, or just 96 MB per GPU, which only get reduced as AMD segmented the RX 7900 XT and RX 7900 GRE by disabling one or two MCDs.The upcoming RDNA 4 architecture will correct some of the glaring component level problems causing the performance/Watt curve to waver on RDNA 3; and the top RDNA 4 part could end up with performance comparable to the current RX 7900 series, while being from a segment lower, and a smaller GPU overall. In case you missed it, AMD will not make a big GPU that succeeds the "Navi 31" and "Navi 21" for the RDNA 4 generation, but rather focus on the performance segment, offering more bang for the buck well under the $800-mark, so it could claw back some market share from NVIDIA in the performance- mid-range, and mainstream product segments. While it remains to be seen if RDNA 5 will get AMD back into the enthusiast segment, it is expected to bring a significant gain in performance due to the re-architected design.
One rumored aspect of RDNA 4 that even this source agrees with, is that AMD is working to significantly improve its performance with ray tracing workloads, by redesigning its hardware. While RDNA 3 builds on the Ray Accelerator component AMD introduced with RDNA 2, with certain optimizations yielding a 50% generational improvement in ray testing and intersection performance; RDNA 4 could see AMD put more of the ray tracing workload through fixed-function accelerators, unburdening the shader engines. This significant improvement in ray tracing performance, performance/watt improvements at an architectural level, and the switch to a newer foundry node such as 4 nm or 3 nm, is how AMD ends up with a new generation on its hands.
AMD is expected to unveil RDNA 4 this year, and if we're lucky, we might see a teaser at the 2024 Computex, next month.
Sources:
wjm47196 (ChipHell), VideoCardz
Performance per Watt is the biggest metric on which a generation of GPUs can be assessed, and analysts believe that RDNA 3 missed the mark with generational gains in performance/watt despite the switch to the advanced 5 nm EUV process from the 7 nm DUV. AMD's decision to disaggregate the GPU, with some of its components being built on the older 6 nm node may have also impacted the performance/watt curve. The leaker also makes a sensational claim that "Navi 31" was originally supposed to feature 192 MB of Infinity Cache, which would have meant 32 MB segments of it per memory cache die (MCD). The company instead went with 16 MB per MCD, or just 96 MB per GPU, which only get reduced as AMD segmented the RX 7900 XT and RX 7900 GRE by disabling one or two MCDs.The upcoming RDNA 4 architecture will correct some of the glaring component level problems causing the performance/Watt curve to waver on RDNA 3; and the top RDNA 4 part could end up with performance comparable to the current RX 7900 series, while being from a segment lower, and a smaller GPU overall. In case you missed it, AMD will not make a big GPU that succeeds the "Navi 31" and "Navi 21" for the RDNA 4 generation, but rather focus on the performance segment, offering more bang for the buck well under the $800-mark, so it could claw back some market share from NVIDIA in the performance- mid-range, and mainstream product segments. While it remains to be seen if RDNA 5 will get AMD back into the enthusiast segment, it is expected to bring a significant gain in performance due to the re-architected design.
One rumored aspect of RDNA 4 that even this source agrees with, is that AMD is working to significantly improve its performance with ray tracing workloads, by redesigning its hardware. While RDNA 3 builds on the Ray Accelerator component AMD introduced with RDNA 2, with certain optimizations yielding a 50% generational improvement in ray testing and intersection performance; RDNA 4 could see AMD put more of the ray tracing workload through fixed-function accelerators, unburdening the shader engines. This significant improvement in ray tracing performance, performance/watt improvements at an architectural level, and the switch to a newer foundry node such as 4 nm or 3 nm, is how AMD ends up with a new generation on its hands.
AMD is expected to unveil RDNA 4 this year, and if we're lucky, we might see a teaser at the 2024 Computex, next month.
169 Comments on AMD RDNA 5 a "Clean Sheet" Graphics Architecture, RDNA 4 Merely Corrects a Bug Over RDNA 3
Expected performance between Radeon RX 7700 XT and Radeon RX 7800 XT.
Navi 48 a larger chip, max 350 mm^2, successor for RX 6700 XT (Navi 22 335 mm^2), and RX 7700 XT / RX 7800 XT (Navi 32 346 mm^2).
Expected performance between Radeon RX 7900 XT and Radeon RX 7900 XTX.
Makes sense to expect <800$ price tag. RX 7900 XT is 690$, RX 7900 XTX is 820$. They must sacrifice the extremely large profit margins in order to make a bit more sales.
AMD's current position doesn't allow both of them at the same time. Either one or the other.
The DIY market has been dead forever according to some, yet, here we are with DIY PCs in 2024. No reason for unwarranted negativity as long as we're reading news about future desktop architectures, imo.
As for RDNA 3, it only had minor changes in its RT engine compared to RDNA 2, but RDNA 4's RT will be a completely new design. Combined with RDNA 3-like CUs, I'm quite excited to see what it's capable of, despite all the negativity in this thread. :)
Going by the interwebs rdna3 has done pretty terrible overall with AMD almost dismissing it as a low margin product they don't really care about at the very least their gaming division which includes consoles isn't doing very good going by the latest earnings call tanking like 50%.
www.pcgamer.com/hardware/graphics-cards/amds-gaming-graphics-business-looks-like-its-in-terminal-decline/
Same with RDNA3 vs 2, kinda performs exactly like you'd think it would with the additional CUs.
Also if this is true, why wouldn't AMD offer an updated 8900XT/XTX with fixes and the original 2x IC. That would provide a healthy boost to performance and give them something to fight 5080 at least for a a while until RDNA5.
I call total BS.
To stand out, RDNA 4 would need to be more than just cheap: it'd need to pull at least the same current generational performance at a much lower power footprint, and the GRE you mentioned is a perfect example of a product that you don't want to see in this next efficiency and cost-focused generation: much like the RX 6800 before it, it's a byproduct of lower yield on the big N31 die, it has a very significant amount of disabled execution units and memory channels. It's got 80 (vs. 84 on 7900 XT and 96 on the full XTX) execution units and a quarter of its memory capacity, bandwidth, and attached cache disabled - meaning it loses a lot of performance while keeping a similar power footprint.
It should still be much faster than the 7800 XT on paper, since it still has much more resources available, but it ain't, and there's your answer. Even in bandwidth-friendly 1080p resolution, the fastest GRE model is only 4% faster than the 192-bit Nvidia card that has even less bandwidth available and only 11% faster than the 7800 XT which is much smaller and has a much lighter footprint.
In all reality this is precisely where AMD screwed up with Navi 31... it doesn't scale. It seems to have an architectural bottleneck somewhere, or some resource utilization issue stemming either from a hardware bug or very severe software issue you should not hope that AMD could fix. The RX 7900 XTX is on paper faster than the RTX 4090 in almost every theoretical metric, yet in real life use cases, it languishes around the much leaner AD103 (which is closer to Navi 32 in size and scope - just not price).
The 4070, super, Ti, extra super duper, super Ti, Ti special edition are all swipes at the 7900XT and all fail at whichever metric you want to compare.
7900GRE, cheaper, as fast at most games.
7900XTX, between 24-38% faster at RT, can be found on sale for less than $180 more.
Point out where it doesn't scale at resolution? A single image from a single review.
Look how shitty the 4070 is at RT. Merely 4% faster than the 7900GRE
My post is about each processor and where they stand overall in the product stack, regardless of their cost as a finished, marketable product. I prepared this table with the specifications of either highest-released product or full chip available in each tier of processor at a similar size and footprint with data from the TPU database:
~609 mm², 450 W
~204 mm², 190 W
From this table it becomes very easy to extrapolate, by general resource availability estimates: AMD placed heavier emphasis on raster power as you can see their cards have a very high number of raster operation pipelines and less focus on texturing capabilities, which are actually quite capable as is despite Nvidia winning here - at the high-end, they opted to trade off a small amount of performance for TDP (low reference clocks permitted it to be a 355 W, traditional 2x 8pin design, as evidenced by AIB 7900 XTXs with unlocked power posting record power consumption without the performance to show for it), there is a noticeable decrease in rated TFLOPS but this won't affect the graphics performance much.
I was generous with Navi 32 - despite being roughly the physical size of AD103, its specs mirror AD104 much more closely. That's also the market segment where both are being sold as finished products - the 4070 Super and Ti are priced similarly to the 7800 XT.
At the lower end, AMD instead opted to juice their cards a little bit to ensure that they kept up, but in general - Nvidia's products are always competing with an AMD product that is a tier higher, in some cases or workloads, even two tiers higher. This becomes most evident in N33 vs. AD107, the RTX 4060 and RX 7600 have just about the exact same level of performance overall, despite the 7600 being the superior card on paper, having a larger die and more power available for it.
And that is why we paid $1200+ on a 4080. It wasn't because AMD is a charity, our friends, think better of consumers or anything, they made the best of the steaming heap of shite they had for show at the time. Without sugarcoating the truth, and I know this will make a lot of AMD fans upset, RDNA 3 is an unmitigated disaster. In most scenarios, its strengths are far outweighed by its weaknesses. The area I cannot fault AMD this generation is drivers, they are really investing hard in this area and it seems the brass finally got the message about gaming features that make GeForce so popular.
BUT, the important thing is that they actually made decent products out of it, as ultimately, consumers want affordable graphics cards that can run their games, and they'll do this just fine at their currently practiced prices. Few will care about technicalities such as the ones I am talking about here, they don't care what processor it is or how many execution units, all they care about is fps, cool features and that the thing doesn't crash. The fact you're so defensive of a technically shoddy product like the 7900 XT (once upon a time envisioned as "RTX 4080 killer") is proof of that, and what makes the 7900 XT decent rather than an all-around laughing stock? It's not $900 anymore. At their original MSRP? Not worthy of consideration. In fact you'd need to be a fool to consider the 7900 XT at all with its launch price, even if the only other option was the 7900 XTX, the pricing for them was too close and the XTX is just a better product.
To be specific my point is that as a pretentious RTX 4080 killer the XT is garbage. As a regular product at $550 it's a great deal.
This post took far more time to make than I should have spent on it, and I think I have clarified my train of thought and exposed the way I think, above any marketing BS and bringing the facts to the table.
Otherwise, why would AMD give up on the high end this generation to "recalibrate" and instead focus on high-volume channels to sort their issues out? AMD already took the price war approach. It's what you're seeing with the current generation, from top to bottom throughout the entire stack. It wasn't enough, and the result is that Radeon looks like a liability in their earnings report. Not even a single lousy billion of income last quarter, in the middle of a GPU boom. No wonder shareholders aren't happy.
Needless to say this post is my opinion and my opinion alone. Don't take it as gospel, personally or anything. It's an insomniac's rant.
TSMC 7N to TSMC 6N is not a die shrink, but a small optimisation move. TSMC 6N is TSMC 7N+ in fact.
AMD's mistake was that it didn't shrink Navi 33 on the newer TSMC 5N, 4N or 3N processes! :banghead:
If anyone is to remove, it's the CEO, and then restructure the whole company under one top priority - GPUs.
CPUs can be left as a third priority, because they are not so important - bring less revenue, and AMD executes better with less efforts there.
finance.yahoo.com/quote/NVDA/
finance.yahoo.com/quote/AMD/
The XTX, XT and GRE are expensive silicon with raw specs, manufacturing costs, power envelopes, and transistor counts far exceeding their actual performance. That's the architectural bottleneck/bug that we are guessing AMD is talking about fixing. I distinctly remember the launch of Navi31 being a disappointment and multiple channels interviewing Scott Herkelman and covering Sam Naffziger's deep dive on the pros and cons of moving GPUs to chiplets. AMD basically admitted that they couldn't scale the interconnects as well as they hoped which is one possible reason why RDNA3 chiplets weren't as fast or cost-effective as the hype and pre-launch info originally predicted.
If fixing this bug opens up 20-30% more performance from the same design, then we have an RDNA4 "7800XT Revision 2.0" that AMD's been selling for $500 up until now, but with ~7900XT performance.
I just hope that among the big changes that RDNA5 intends to implement one will be MCM with Multi-GCD.
GPUs are very sensitive to latencies, they work best when the latencies are extremely low, which means a monolithic design, chiplets are good for CPUs, but extremely bad for GPUs.
That's why CrossFire is no longer supported.
Do they want to invent a new type of CrossFire?
The gap between nVidia and AMD has widen to a ridiculous amount.
A couple of days ago I read about companies that ordered 5b, 14b, 30b chips from nVidia.
....b=billions $.
China’s internet giants order $5bn of Nvidia chips to power AI ambitions
Spent nearly 10 billion US dollars to buy Nvidia AI chips! Meta is making every effort to develop open source AGI
Chiplets are why 96-core EPYCs exist and are stealing huge amounts of business from Intel. AMD can sell those 12-chiplet EYPCs for $14750 rather than selling twelve 7950X CPUs for $550 each.
So, I doubt they'll give up. If AMD ever manage to solve the latency issues and scale GPU cores (not memory subsystems) out to multiple chiplets, they could effectively make massive GPUs like the 4090 is, but at a tiny fraction of the cost.
Dismissing AMD because their very first attempt at GPU chiplets was mediocre would be stupid. They've demonstrated they have the skill to improve chiplets, reducing the downsides to a mutli-chip architecture with each generation. They will never be as good as performant as the equivalent monolithic solution, but that's not the point, they can potentially scale far beyond what's even possible to manufacture as a monolithic design.
Give them time; AMD might give up on the idea of scalable GPU chiplets, but I have my doubts.
A Multi-GCD design is the most important thing AMD could bring out to be more competitive. Instead of developing 5-6 chips, a single block (GCD) would serve all segments, simply by putting these chips together. Billions would be saved in the process.
But it's obvious that such a design needs to drastically change the graphics processing model.
"The new patent (PDF) is dated November 23, 2023, so we'll likely not see this for a while. It describes a GPU design radically different from its existing chiplet layout, which has a host of memory cache dies (MCD) spread around the large main GPU die, which it calls the Graphics Compute Die (GCD). The new patent suggests AMD is exploring making the GCD out of chiplets instead of just one giant slab of silicon at some point in the future. It describes a system that distributes a geometry workload across several chiplets, all working in parallel. Additionally, no "central chiplet" is distributing work to its subordinates, as they will all function independently.