Monday, January 29th 2024
Top AMD RDNA4 Part Could Offer RX 7900 XTX Performance at Half its Price and Lower Power
We've known since way back in August 2023, that AMD is rumored to be retreating from the enthusiast graphics segment with its next-generation RDNA 4 graphics architecture, which means that we likely won't see successors to the RX 7900 series squaring off against the upper end of NVIDIA's fastest GeForce RTX "Blackwell" series. What we'll get instead is a product stack closely resembling that of the RX 5000 series RDNA, with its top part providing a highly competitive price-performance mix around the $400-mark. A more recent report by Moore's Law is Dead sheds more light on this part.
Apparently, the top Radeon RX SKU based on the next-gen RDNA4 graphics architecture will offer performance comparable to that of the current RX 7900 XTX, but at less than half its price (around the $400 mark). It is also expected to achieve this performance target using a smaller, simpler silicon, with significantly lower board cost, leading up to its price. What's more, there could be energy efficiency gains made from the switch to a newer 4 nm-class foundry node and the RDNA4 architecture itself; which could achieve its performance target using fewer numbers of compute units than the RX 7900 XTX with its 96.When it came out, the RX 5700 XT offered an interesting performance proposition, beating the RTX 2070, and forcing NVIDIA to refresh its product stack with the RTX 20-series SUPER, and the resulting RTX 2070 SUPER. Things could go down slightly differently with RDNA4. Back in 2019, ray tracing was a novelty, and AMD could surprise NVIDIA in the performance segment even without it. There is no such advantage now, ray tracing is relevant; and so AMD could count on timing its launch before the Q4-2024 debut of the RTX 50-series "Blackwell."
Sources:
Moore's Law is Dead (YouTube), Tweaktown
Apparently, the top Radeon RX SKU based on the next-gen RDNA4 graphics architecture will offer performance comparable to that of the current RX 7900 XTX, but at less than half its price (around the $400 mark). It is also expected to achieve this performance target using a smaller, simpler silicon, with significantly lower board cost, leading up to its price. What's more, there could be energy efficiency gains made from the switch to a newer 4 nm-class foundry node and the RDNA4 architecture itself; which could achieve its performance target using fewer numbers of compute units than the RX 7900 XTX with its 96.When it came out, the RX 5700 XT offered an interesting performance proposition, beating the RTX 2070, and forcing NVIDIA to refresh its product stack with the RTX 20-series SUPER, and the resulting RTX 2070 SUPER. Things could go down slightly differently with RDNA4. Back in 2019, ray tracing was a novelty, and AMD could surprise NVIDIA in the performance segment even without it. There is no such advantage now, ray tracing is relevant; and so AMD could count on timing its launch before the Q4-2024 debut of the RTX 50-series "Blackwell."
396 Comments on Top AMD RDNA4 Part Could Offer RX 7900 XTX Performance at Half its Price and Lower Power
And me selling my 6800XT for $600 and buying the 7900XTX for $1200 so cost me $600 for the upgrade was why I did it.
If I had to start from scratch with no gpu to sell I probably won't have done it.
Aside: I wonder why Phoronix chose to use red for Intel and blue (actually purple) for AMD.
Chips And Cheese has done an exceptional job extracting details about RDNA 4 from LLVM. It seems to me that changes are mainly aimed at efficiency and AI.
Either way your claim that AMD is not greedy because it is owned by Saudi Royal Family has to be up there with the most bizzare claims I've heard, even if it were true it doesn't make a lick of sense.
I would want to see AMD moving chiplets to GPU segment. It worked with CPUs i really can't see why it would not have worked with GPUs. I think there are more obstacles with GPUs (latency is the worst one I can think of from the top of my head and scheduling i think). Wonder, if AMD would need to use some sort of I/O chiplet like with CPUs. Probably yes to avoid the chiplets to be seen as 2 different graphics chip.
People are not going to pay 3000$ for an RTX 6070 with 350 sq.mm die.
AMD must move away from TSMC. I see that Samsung charges better.
BTW, TSMC no longer makes higher profits because of more sales, but simply because of higher pricings. This is not sustainable and its bubble will go boom... and disappear.
www.trendforce.com/news/2024/01/24/news-tsmcs-2023-wafer-average-selling-price-rises-by-22-driven-by-n3-process-success/
TSMC isn't the problem. Allocation wouldn't change even with more capacity because to AMD, Radeon is neither their biggest nor its most important business.
Samsung for sure has better offers than that!
The chiplets introduce performance issues, yeah, economically they look like a solution but it's more like a fake solution because they leave performance in the glue between those chiplets!
The technology would permit mixing say, Samsung 8N (the node used on Ampere) for auxiliaries with a TSMC N4P for the GPU core, but I still strongly suspect this is neither necessary nor affordable at the present time, and would likely not increase the volume of cards that AMD shifts. Remember, the problem isn't stock, it's the product itself, and the only market where AMD has managed to make a significant incursion was the European market because they're extremely sensitive to pricing, it's a penny pinching consumer culture and they'll avoid spending a single euro above what's absolutely required, and they are very willing to make concessions in experience and functionality if it can be considered "superfluous" and it means they get a deal.
People are sensitive to pricing these days more than previously first because it's stagflation, and because we are used to much cheaper prices.
RX 7900 XTX if released in 2015 would have cost 450$.
- moderately small GCD dies; each MI300 XCD is 115 mm^2 and has 40 Compute Units (contrast with 96 CU for the 7900 XTX GCD)
- base die with last level cache, multimedia engine, and off-chip interfaces, i.e. DRAM and PCIe
Note that unlike the MI300, the base die would serve as an interposer too. This wouldn't be as expensive as MI300, but I suspect that unlike the current strategy, it would be too expensive to scale down. So this would have only been a good choice for a 5090 competitor.Let's see what they actually do with RDNA4. I haven't read any leaks about CU counts or VRAM bus width so I suspect they will be following the RDNA 3 strategy of different sized GCDs and 1 type of MCD. If my hypothesis about the strategy for high-end RDNA4 being a poor fit for cheaper SKUs is right, then we will likely see at least a successor to the 7800 XT.
4096 shaders across 192-bit memory bus.
www.tweaktown.com/news/94533/amds-next-gen-rdna-4-navi-44-and-48-from-radeon-rx-8000-series-gpus-appear-in-linux/index.html
Performance guesstimate - around RX 6950 XT or RX 7900 XT.
Price should be 399$ to be well accepted by both the reviewers and the gamers.
While I am in-fact pulling this out of my speculative ass, I think 4096/8192 appears much more plausible.
Shrinking N33 and pumping clocks (up to ~3264/24000 at stock) would appear a reasonable and cost-efficient thing to do, could compete with products below AD104, perhaps at a size similar to AD107 (which N33 already beats but at a larger size, granted on an older [then cheaper, but 4nm is now also cheap(er)] process). Shrinking from 6nm->4nm could/should increase density by ~40% (44% by my linear math, but I don't know exactly how much more space the lesser-scaling parts like cache would take up, nor do I know how much space arch changes/increasing clocks would effect it). You're really getting into the weeds when you think about the fact 7nm SRAM cell size is .027µ (no idea on 6nm) and 4nm is ~.0199mm2µ (same size as n3b, n3e is WORSE), while decap (for higher clocks) on rv770->rv790 was ~26mm2 (way back on 55nm).
The processes, clocks, and available ram chips make this very possible, and it's placement makes sense (cheapest option that will get you PS5 performance), and I'd be very surprised if they couldn't make it work with 16GB for <$300.
I personally stopped thinking GDDR7 was plausible a while back as I don't *think* they will be available yet by the time this needs to launch. I *could* be wrong. Samsung has been very quiet about a release date (the only one we have is Micron at the EOY, I assume for Blackwell). It could happen, but I still think 24gbps GDDR6 is more likely due to the fact literally no other products have used it (it should be much cheaper) and it fits rather well with a small update to their current parts. Also, if you do the math, 24gbps is actually a better fit. The ram bandwidth would be a wash (or perhaps overclock slightly better), but unless AMD doubled up their L3 cache it would limit bandwidth more quickly (16MB of L3 constitues about 8.5% of bw perf if such a part is 2720/20000). It just sounds more expensive for very little gain...maybe ~5% and that's IF it could hit ~3700mhz, while losing 4GB of buffer. Not a trade I would make.
Likewise, doubling that up is very much akin to what AMD has done in the past. IMHO the actual shader count is likely to be close to 7800xt as that part is clocked at 2425mhz for a reason (part of which may be to make a similar part with higher clocks look like a generational leap), likely to look like a faster PC alternative to the PS5 pro for similarish (and eventually within it's lifetime) less money, which is something the PC gaming market really needs to do if it wants to survive.
8192*3200/7680*2425 = +40% (not counting whatever architectural enhancements may occur).
These just make the most sense to me. We know that 4nm(P?) can yield well at 3200mhz (base clock of Zen4c, typical operational speed of M2 is 3200-3300mhz), we also have seen it scale to 3500mhz (M2) and 3600-3700mhz (M2 Max/Ultra, Zen4c max boost). To me, it appears like the process is most comfortable around 3200-3500mhz mark, which fits perfectly with 24Gbps GDDR6 (including overclocking to ~25.6gbps) and similar L3 cache size to old parts.
This is bang-on to what TSMC claimed about n4p (6% greater density, 11% greater performance). Most (good yield) 5nm GPUs can hit 2900mhz at 1.08v (or lower). I don't see how ~3200mhz @ 1.2v would be a problem (with conceivable headroom to ~3500mhz), nor how overclocking to the above would be out of the question. If you do the math, (even with overclocking) it would appear to be able fit within 375w (or parts with 1x8-pin on the low-end and 2x8-pin on the high-end). Some of you may say "But that's the power draw of a 4080 Super for conceivably slightly less performance". That's very true, but it could also conceivably be the size of AD104, hence a similar MSRP and beat it's actual competition (4070 super), and conceivably tie (and/or beat in raster) 4070 Ti Super. Essentially this part would make a lot more sense than discounting 7900xt into the ground, likely past the point they would ever want to if they still wanted to make any margains. Likewise, I believe a 16GB (stronger than) PS5-like card would go down very well if under $300 (probably less than a discounted PS5), which just might not be possible for navi 33 due to it's size. It might very-well be possible due to 4nm lowering in cost (and more wafer allocation being avaiable) recently (which it certainly has due to Apple/nVIDIA moving to 3nm) as proven by Sony/AMD moving to 4nm, which they wouldn't do unless it was cost-effective.
I also agree that displacing 7900xtx is a tough thing to believe, which is why I think cancelling N4(c?), the chiplet 9 SED (17280sp?) design, was such a difficult decision on AMD's part, as one can imagine how that could be competative. They could always refresh the current chips with 24gbps memory, and we know they are capable of greater than 3000mhz (which is where a refresh/faster memory begins to make sense for N31), but they indeed would consume a ton of power and probably still not catch up to the 4090 (which nVIDIA themselves could then [and may still] refresh). I suppose if they want to make an overtly compelling 4080 Super competition they could, but to what end when Blackwell isn't too far away and the lowest-end "enthusiast" chip in that lineup could very well match a 7900xtx in raster (my best guess at this point is 192-bit/18GB/9216sp @ 3375/32000). AMD might have a slight advantage over-all conceivably in raster at the same price if you factor in overclocking, but that's where the arguement of power consumption(DLSS/RT) really could become a deciding factor. There's always the possibility of a 4nm respin, but I'm not holding my breath. Would love to be wrong as more options and competition is good for everyone.
Point is, the demarkation line (in my mind) is the PS5 and the PS5pro. To put it in more-simple terms, a 7600xt (which is old process tech and still over the golden $300) or overclocked 7800xt (OT: which some people won't do or consider when looking at review charts...even though you'll likely get close to a 20% gain vs a dirt-cheap MSRP 2425mhz stock model and does become a compelling option versus a 4070 Super, especially given the extra ram. /OT), so really they need a cheaper 7600xt and 7900xt (the latter of which is the gateway to high-end PC gaming). They need stock competition for the 4060 Ti 16GB (preferably under the magic $300 mark, which 7600xt is not and nVIDIA would likely never challenge UNLESS they release a point 4060 Ti Super with 4608sp and relegate the old part lower to compete; both hypothetical super and conceivable '8600xt' would probably OC to a similar level in raster) and 4070 Ti Super (preferably for the price of a 4070 Super or less; 7800xt can't hit that perf level and 7900xt costs too much to make). Those would be that thing. They might use a bit more power or run warmer (albeit most AIB coolers are plenty strong), but would make financial sense (wrt die size/cost) versus a chip one slot down in nVIDIA's lineup, which is typically where AMD likes to play. They've always done that: they price themselves against their weakest aspect versus nVIDIA. In this case it has been RT (by ~20%), which again I find amusing because not only do AMD's chips have stronger raster (especially after adding the delta from overclocking), nVIDIA outdates their last gen wrt RT with every new gen (while raster/RAM springs eternal for actual playability at a given resolution). Seriously, look at the playability of a 2000/3000 series in RT with similar raster performance (that's right, the old cards are largely unplayable)...but that's the market. Win for value if you don't constantly update for RT or to get back the missing ram nVIDIA took from you and that you eventually needed.
But hey, let's see how it pans out. AMD has disappointed before, but they also usually have at least one part that's a diamond wrt $/perf (especially if you overclock).
Although Leakers said only Low-end Navi 43 and 44 for RDNA4, the timing is perfect for a monolithic 4nm Navi 42 part (4nm is probably the last node offering adequate size reduction for cache (1.35X) , it offers the potential for 1.15X frequency jump vs 7nm and by Q4 2024 it will be a mature process offering great yields.Even if RDNA4 is a small iteration focusing only on reducing size in ISO process and extracting a small bump in performance per SE for example (like only 5%) a 4nm 256bit 4SE design with128RBE and 80CU and coupled with GDDR7 (+64MB infinity cache) it will be able to match 7900XTX.It will be 24GB also due to the new GDDR7 increased capacity.This will probably be at $599 offering around +20% raster performance and +6GB vs a RTX 5070 at the same price.The main reasons they don't go for higher-tier models is because the are left behind in features and the vast majority buying $600 and above they will always go Nvidia if AMD offers only 10% more raster performance/$, forcing AMD to these tiers to offer 15-20% more performance/$ + increased memory capacity in order to have a chance making the whole endeavor not very desirable financially for AMD.Also in 2024 we will see increased capacities in memory sizes, for example a 256bit design with 24GB GDDR7 and probably by Q4 2026 32GB, so these 4 years (especially if PS5 Pro has the same 16GB capacity and depending when Xbox next will launch) the memory advantage AMD will offer per tier will not be so important for market perception and sales anymore.Also a OC 7900XTX is at the edge to not be considered CPU limited (4090 it is) so the percentage that they can increase GPU performance with a Zen5X3D is very specific and uninspiring (if it is CPU limited at the same level as a 7900XTX was at Q4 2022). Actually I don't see AMD returning to 384bit designs before Q4 2028 at the earliest (if ever...)