Friday, June 10th 2022
AMD RDNA3 Offers Over 50% Perf/Watt Uplift Akin to RDNA2 vs. RDNA; RDNA4 Announced
AMD in its 2022 Financial Analyst Day presentation claimed that it will repeat the over-50% generational performance/Watt uplift feat with the upcoming RDNA3 graphics architecture. This would be a repeat of the unexpected return to the high-end and enthusiast market-segments of AMD Radeon, thanks to the 50% performance/Watt uplift of the RDNA2 graphics architecture over RDNA. The company also broadly detailed the various new specifications of RDNA3 that make this possible.
To begin with, RDNA3 debuts on the TSMC N5 (5 nm) silicon fabrication node, and will debut a chiplet-based approach that's somewhat analogous to what AMD did with its 2nd Gen EPYC "Rome" and 3rd Gen Ryzen "Matisse" processors. Chiplets packed with the GPU's main number-crunching and 3D rendering machinery will make up chiplets, while the I/O components, such as memory controllers, display controllers, media engines, etc., will sit on a separate die. Scaling up the logic dies will result in a higher segment ASIC.AMD also stated that it has re-architected the compute unit with RDNA3 to increase its IPC. The graphics pipeline is bound to get certain major changes, too. The company is doubling down on its Infinity Cache on-die cache memory technology, with RDNA3 featuring the next-generation Infinity Cache (which probably operates at higher bandwidths).
From the looks of it, RDNA3 will be exclusively based on 5 nm, and the company announced, for the very first time, the new RDNA4 graphics architecture. It shared no details about RDNA4, except that it will be based on a more advanced node than 5 nm.
AMD RDNA3 is expected to debut in the second half of 2022, with ramp across 2023. RDNA4 is slated for some time in 2024.
To begin with, RDNA3 debuts on the TSMC N5 (5 nm) silicon fabrication node, and will debut a chiplet-based approach that's somewhat analogous to what AMD did with its 2nd Gen EPYC "Rome" and 3rd Gen Ryzen "Matisse" processors. Chiplets packed with the GPU's main number-crunching and 3D rendering machinery will make up chiplets, while the I/O components, such as memory controllers, display controllers, media engines, etc., will sit on a separate die. Scaling up the logic dies will result in a higher segment ASIC.AMD also stated that it has re-architected the compute unit with RDNA3 to increase its IPC. The graphics pipeline is bound to get certain major changes, too. The company is doubling down on its Infinity Cache on-die cache memory technology, with RDNA3 featuring the next-generation Infinity Cache (which probably operates at higher bandwidths).
From the looks of it, RDNA3 will be exclusively based on 5 nm, and the company announced, for the very first time, the new RDNA4 graphics architecture. It shared no details about RDNA4, except that it will be based on a more advanced node than 5 nm.
AMD RDNA3 is expected to debut in the second half of 2022, with ramp across 2023. RDNA4 is slated for some time in 2024.
121 Comments on AMD RDNA3 Offers Over 50% Perf/Watt Uplift Akin to RDNA2 vs. RDNA; RDNA4 Announced
The TDP is arbitrary at a level they set. I can get 50% performance/Watt gains from any AMD GPU I've used in the last half decade simply by tuning it less aggressively. Polaris, Vega, and Navi all reached stratospheric TDPs but dial the power consumption back by 40% and you still had ~90% of the performance. Voila, instant 50% perf/W gain by messing with a couple of sliders and pressing "apply"
Ampere and Turing are similar, I cram a 3600XT and RTX 3060 into a tiny cramped HTPC case and want to keep them quiet. The CPU has 30W cut from its PPT using PBO and the GPU power limit is set to 75% in Afterburner. The underclocked result gets over 90% of the original performance and that's a tiny price to pay for near-silence at full load.
Ampere was x2 as well. But oh yeah, only best case with RT because performance is now half as abysmal.
Let's stay sane. 50% is a realistic gen-to-gen leap, and a pretty big one already at that. We used to be happy with 30% on the same tier.
Nobody doubt the goodness of the chiplet approach bro.
Pascal: 2 FPS
Turing: 10 FPS
Ampere : 15 FPS
But the worst part is those so called reviewers accepting and even praising nVidia.
The 3060 Ti is still a good implementation overall, but AMD has a firm lead on efficiency still. That is ... uh ... just as arbitrary. The results that matter are the results of products that people are actually able to buy - the retail configurations. Whatever underclocking and undervolting results you end up with vary based on the silicon lottery, your willingness to sacrifice performance, and a bunch of other variables. How on earth would you settle on a standard for comparison at that point? Those graphs are literally the only reasonable way of comparing this, unless you're going to spend weeks on end tuning every single GPU to find its peak perf/W point in order to compare their peak efficiency.
With RDNA3 there wont be even a couple that will accommodate more than 1 core chiplet. That was my point. Perhaps I should have been more precise with my statements. 3D stacking is a different story.
Anything to make the GPU die smaller and cheaper to make, is a win for them and there are functions of a GPU that wont get penalised from being external
As for the 2x gains the math checks out if 450W is correct. 1.5x perf/watt * 1.5x more power = 2.25x performance.
RDNA2 was actually around a 54% perf/watt improvement over RDNA1 so if that was the case this time around then that would be a 2.3x performance gain.
I expect such gains will only materialise at 4K due to CPU limits at lower resolutions but that is to be expected.
RT will probably actually matter far more this go around so 4K + RT might be where the proper high end battle arises.
The node reduction itself from 6nm down to 5nm is what 1/6? across two chip dies which works out to 1/3 they also shuffle logic to the I/O and I'm not sure how much that occupies off hand, but say it bumps up to 40% more silicone space with the space that is pretty good. The other good aspect is heat is spread out more between two chip dies which is better than a one chip die the size of 2 all condensed in one spot. It's much better for the heat load to be spread apart and radiate more to the cooler. That even reduces stress on the VRM's that have to power the fans for the GPU. Something interesting is if a AIB's were to ever put a fan header on the side that could be plugged into a system header instead shifting more stress to the MB VRM's and off of the GPU's VRM's given they can consume a few watts.
It seems pretty reasonable and plausible. Let's not forget there could be a bit more room to increase the die size to make room for more of that cache over the previous chip dies. In fact even not taking that into account if the cache is on the die and you pair two you double the cache. This isn't SLI/CF either plus it's got a dedicated I/O die as well. Just moving logic to the I/O die will free up silicone space on the chip die. It might not be 50% in all instances, but up to in the right scenario I can see it. Lastly FSR is another metric in all of this and gives a uplift on efficiency per watt. You can certainly argue it's important to consider the performance per watt context scenario a company be it AMD/Intel/Nvidia or others are talking about.
I'm going to go out on a limb on this one and say it could be 50% performance per watt or greater across the entire RDNA3 product segment under the right circumstances. You have to also consider along with all the other parts mentioned voltage is squared and smaller dies running at lower wattage require lower voltage increasing efficiency per watt as a whole. So I'm pretty certain this can be very much realistic. I'm not going to say I'm 100% about 50% performance per watt across the entire SKU lineup, but AMD hints at it you can argue without explicitly going into detail. AMD neither indicates nor discredits that it's for a particular RDNA3 SKU, but rather lists RDNA3 which could be either or though can subtly pointing out it's across the product lineup or at least the initial launch product lineup.
60% performance per watt uplift:
AMD Radeon RX 480 Specs | TechPowerUp GPU Database
AMD Radeon RX 5600 XT Specs | TechPowerUp GPU Database
Sure, ideally Navi 22 would have been a 60CU design and thus with only 25% cuts applied to computation, i.e. the same as the cuts it got to its memory and IC subsystems. But I imagine when AMD had to plan for how many chips they can get from their 7nm wafers they underestimated their yields - or just decided to err on the side of caution - and we never got a symmetrical trimming down the product lineup.
It's absolutely possible that an MCM approach can allow for power savings, but only if it allows for larger total die sizes and lower clocks. Otherwise it's no different from a monolithic die, except for the added interconnect power. And, of course, larger dice are themselves a fundamental problem when per-transistor costs are no longer dropping noticeably, which is leading to rapidly rising chip prices. Again, this isn't accurate. A GPU die has its heat very evenly spread across the entire die (unlike CPUs which are very concentrated), as most of the die is compute cores. Spreading this across two dice won't affect thermals much, as both dice will still be connected to the same cooler - it's not like you're running them independently of each other. Assuming the same power draw and area for a monolithic and MCM solution, the thermal difference between the two will be minimal. And, crucially, you want the distance between dice on package to be as small as possible to keep latencies low. Fans generally run directly off 12V and don't rely on VRMs on the GPU, just a fan controller IC sending out PWM signals (unless the fans are for some reason controlled through voltage, which is rather unlikely). Idk, I think the truth is somewhere in the middle. Both chips have distinct qualities and deficiencies. The 6800 is fantastically efficient; the 6700 XT gets a lot of performance out of a relatively small die. Now, the 6700 XT is indeed rather poor in terms of efficiency for an RDNA2 chip, but it still beats out the majority of Ampere GPUs, so ... meh. (The 6500XT is another matter entirely.)
I still can't wrap my head around AMD's RDNA2 segmentation though. The 16-32-40-80CU lineup just doesn't make sense IMO, and kind of forced them to tune the 6700XT the way they did. 20-32-48-80 or something like that would have made a lot more sense. It's also weird just how few SKUs Navi 22 has been used in overall.
But it may have a slight background advantage of improving the qualities of the overall solution, my guess is not more than 5% overall.
I mean better thermals, improved management of the integrated parts.
The audio processors should be cut altogether, I don't understand why the GPUs must include audio device which costs transistors on the dice.