AMD Announces the Radeon RX 6000 Series: Performance that Restores Competitiveness

lexluthermiester · Oct 30, 2020

BoboOOZ said:
Plus, while AMD might feel encouraged to slow things down a bit on the CPU side, since they are starting to compete with themselves a bit

Plus, they are focused on a new socket. The Ryzen 5000 series of CPU's are the last for socket AM4. The next will likely be AM5.

Valantar · Oct 30, 2020

lexluthermiester said:
Plus, they are focused on a new socket. The Ryzen 5000 series of CPU's is the last for socket AM4. The next will likely be AM5.

And they can't rush that out before PCIe 5.0 is at least technically viable (likely needs new on-board hardware to ensure signal integrity, which might not be available at consumer price levels for a while) and DDR5 has wide availability. Definitely good reasons to hold off AM5 for a while yet.

But using that as an argument that AMD will try to quicken their GPU development pace? Nah, sorry, not buying that. 16 months between RDNA 1 and RDNA 2. Now we're supposed to get RDNA 3 in < 14 months? And remember, a launch later in the year than this isn't happening no matter what. It's either pre holiday season or CES. Which makes that 12 months, not 14. I really don't see that as likely. I'll be more than happy to be proven wrong, but I'm definitely sticking to a more cautious approach here.

R0H1T · Oct 30, 2020

Why do you think they'll just straight up go with PCIe 5.0 ? They most certainly can skip on that.

DDR5 is a given, PCIe 5.0 is not much of a necessity even on servers. Of course with Xilinx (acquisition) they might surprise us or something.

Valantar · Oct 30, 2020

R0H1T said:
Why do you think they'll just straight up go with PCIe 5.0 ? They most certainly can skip on that.

DDR5 is a given, PCIe 5.0 is not much of a necessity even on servers. Of course with Xilinx they might surprise us or something.

I don't think it's necessary at all, but launching a new long-term platform ~a year before the availability of a I/O standard is generally a bad idea. Of course it's possible that they could launch AM5 with the promise of future PCIe 5.0 support (i.e. first-gen motherboards and CPUs will have 5.0, but will be compatible with next-gen CPUs and mobos that have 5.0 support, just at 4.0 speeds when mixed), but again, that's rather sloppy.

dragontamer5788 · Oct 30, 2020

Zach_01 said:
I think they are able to cut/disable CUs by 2. If you look RNDA1/2 full dies you will see 20 and 40 same rectangular respectively. Each one of these rectangular are 2CUs.

Note: CU is now a bit of a historical artifact. RDNA and RDNA 2 are organized into WGPs, or "Dual Compute Units" (because each WGP has the resources of 2x CUs of old). That's why there are 40 RDNA clusters, which count as 80 "CUs" (even though CUs don't really exist anymore).

CUs were in Vega, and are a decent unit to think about while programming the GPU. WGPs work really hard to "pretend" to work like 2x CUs for backwards compatibility purposes... but they're really just one unit now.

-----

As such: the proper term for those 40x clusters on your RDNA2 die shot is Workgroup Processor (WGP)... or "Dual-compute units" (if you want to make a comparison to Vega).

BoboOOZ · Oct 30, 2020

Valantar said:
But using that as an argument that AMD will try to quicken their GPU development pace? Nah, sorry, not buying that. 16 months between RDNA 1 and RDNA 2. Now we're supposed to get RDNA 3 in < 14 months? And remember, a launch later in the year than this isn't happening no matter what. It's either pre holiday season or CES. Which makes that 12 months, not 14. I really don't see that as likely. I'll be more than happy to be proven wrong, but I'm definitely sticking to a more cautious approach here.

You forget that during these 16 months they effectively launched 3 architectures, RDNA2 + 2 custom APUs, with different architectures and features for consoles. Now the whole GPU design team is free to work on the new GPU generation.

TheoneandonlyMrK · Oct 30, 2020

Valantar said:
I don't think it's necessary at all, but launching a new long-term platform ~a year before the availability of a I/O standard is generally a bad idea. Of course it's possible that they could launch AM5 with the promise of future PCIe 5.0 support (i.e. first-gen motherboards and CPUs will have 5.0, but will be compatible with next-gen CPUs and mobos that have 5.0 support, just at 4.0 speeds when mixed), but again, that's rather sloppy.

They added pciex4 into zen later.

Valantar · Oct 30, 2020

BoboOOZ said:
You forget that during these 16 months they effectively launched 3 architectures, RDNA2 + 2 custom APUs, with different architectures and features for consoles. Now the whole GPU design team is free to work on the new GPU generation.

"The whole design team" is at least four separate design teams (two for Zen). It's not like all the Zen design engineers can just slot into a GPU design team without a significant retraining period. The semi-custom team is no doubt already working on 5nm refreshes for both console makers, but some of their engineers could have been moved to a field closer to their expertise, whether that's CPU, GPU, I/O, fabric, etc. Ryzen is under continuous development; one team just finished Zen 3, the other is hard at work with Zen 4, and no doubt the Zen 2 team is now ramping up development of Zen 5. There might be some minor shuffling, but nothing on the scale you are indicating.

theoneandonlymrk said:
They added pciex4 into zen later.

That's true. But that was quite a long time after AM4 launched, not a year or less.

BoboOOZ · Oct 30, 2020

Valantar said:
"The whole design team" is at least four separate design teams (two for Zen). It's not like all the Zen design engineers can just slot into a GPU design team without a significant retraining period. The semi-custom team is no doubt already working on 5nm refreshes for both console makers, but some of their engineers could have been moved to a field closer to their expertise, whether that's CPU, GPU, I/O, fabric, etc. Ryzen is under continuous development; one team just finished Zen 3, the other is hard at work with Zen 4, and no doubt the Zen 2 team is now ramping up development of Zen 5. There might be some minor shuffling, but nothing on the scale you are indicating.

I wonder where do you get the info on the console 5nm refreshes, do you have any source, or are you just guessing? Sony made it clear there will be no refreshes this generation, at least, and there is no leak or hint of that yet, if any of that will come, it will most probably be way later after RDNA3.

TheoneandonlyMrK · Oct 30, 2020

BoboOOZ said:
I wonder where do you get the info on the console 5nm refreshes, do you have any source, or are you just guessing? Sony made it clear there will be no refreshes this generation, at least, and there is no leak or hint of that yet, if any of that will come, it will most probably be way later after RDNA3.

Doesn't mean they won't evolve what's out for a cheaper BOM, it's what they do.

InVasMani · Oct 30, 2020

Zach_01 said:
My, absolutely based on (my) logic, estimation is that AMD will stay away from GDDR6X. Because they can get away with the new IC implementation. And second for the all kinds of expenses. GDDR6X is more expensive, draws almost X3 the power from “simple” GDDR6, and the memory controller need to be more complex too (=more expenses on die area and fab cost).

This I “heard” partially...
The three 6000 we’ve seen so far is based on the Navi21 right? 80CUs full die. They may have one more N21 with further less CUs, don’t know how many, probably 56 or even less active with 8GB(?) and probably same 256bit bus. But this isn’t coming soon I think because they may have to make inventory first (because of present good fab yields) and also see how things will go with nVidia.

Further down they have Navi22. Probably (?)40CUs full die with 192bit bus, (?)12GB, and clocks up to 2.5GHz, 160~200W, with who knows how much IC it will have. That will be better than 5700XT.
And also cutdown versions of N22 with 32~36CUs 8/10/12GB 160/192bit (for 5600/5700 replacements) and so on, but at this point is all on full speculations and things may change in future.

Also rumors for Navi23 with 24~32CUs but... it’s way too soon.

Navi21: 4K
Navi22: 1440p and ultrawide
Navi23: 1080p only

That does make sense on the GDDR6X situation on the cost, complexity, and power situation relative to GDDR6 and with the infinity cache being so effective. I'd like to think with 192-bit they'd have more than 40CU's considering the infinity cache. If it were 128-bit with 64MB infinity cache I could see something like 36CU even being quite reasonable. I think trying to aim higher than RNDA1 is in AMD's best interest for both longevity and margins or at least matching it at better efficiency and cost to produce.

Valantar said:
Yep, CUs are grouped two by two in ... gah, I can't remember what they call the groups. Anyhow, AMD can disable however many they like as long as it's a multiple of 2.

Looking at them I actually wouldn't expect them to cut that few realistically for a few reasons obviously SKU differentiation is one obvious reason, but the other is heat distribution balance. I'm not sure that's really ideal cutting 4CU's in total with slices of 2CU's diagonal from each other on opposite sides of the die itself kind of makes more sense. That said AMD has a lot of tech packed into their circuitry these days with precision boost and granular management over them that they probably cut only 2CU's if they felt inclined and not have to worry drastically about the heat management and hot spots becoming a real concerning aspect. If it were me I'd probably approach like I described trying to keep heat distribution most efficient when cutting the CU's down. The SKU differentiation is really the biggest concern I feel though I don't think they are going to slice these up 50 ways to kingdom come myself unless they were trying stirr up a bit of a bidding contract war between the AIB's for slightly better binned SKU's of die's in rather finely incremental differentiating ways. I suppose it could happen, but depends on added time and cost to sort thru all that.

Valantar · Oct 30, 2020

BoboOOZ said:
I wonder where do you get the info on the console 5nm refreshes, do you have any source, or are you just guessing? Sony made it clear there will be no refreshes this generation, at least, and there is no leak or hint of that yet, if any of that will come, it will most probably be way later after RDNA3.

No source, but every single console generation since the PS1 has had some sort of refresh. I'm not talking about the new tier, mid-generation upgrades that we saw with the current generation. Refresh = same specs, new process, smaller, cheaper die with lower power draw. The PS1 had at least one slim version. The PS2 had at least 2. I don't think the OG Xbox had one, but the 360 had two, and the One had one (the S). The PS3 had at least a couple, and the PS4 had one (the Slim). Given that 5nm is already in volume production today, it stands to reason that it'll be cheap enough in 2-3 years that console makers will want to move to it. Even if the cost per die is the same due to the more advanced process, they'll save on the BOM through lower power draw = smaller PSU and heatsink.

InVasMani said:
Looking at them I actually wouldn't expect them to cut that few realistically for a few reasons obviously SKU differentiation is one obvious reason, but the other is heat distribution balance. I'm not sure that's really ideal cutting 4CU's in total with slices of 2CU's diagonal from each other on opposite sides of the die itself kind of makes more sense. That said AMD has a lot of tech packed into their circuitry these days with precision boost and granular management over them that they probably cut only 2CU's if they felt inclined and not have to worry drastically about the heat management and hot spots becoming a real concerning aspect. If it were me I'd probably approach like I described trying to keep heat distribution most efficient when cutting the CU's down. The SKU differentiation is really the biggest concern I feel though I don't think they are going to slice these up 50 ways to kingdom come myself unless they were trying stirr up a bit of a bidding contract war between the AIB's for slightly better binned SKU's of die's in rather finely incremental differentiating ways. I suppose it could happen, but depends on added time and cost to sort thru all that.

I didn't say they would be cutting 2 off anything, I said they can cut any number as long as it's 2x something. I.e. 2, 4, 6, 8, 10, 12... Even numbered cuts only, in other words. Nor did I say anything about where they would be cut from - that is either decided by where on the die there are defects, or if there aren't any, whatever is convenient engineering-wise. To quote myself, this is my (very rough and entirely unsourced) guess for the Navi 2 lineup in terms of CUs:

Valantar said:
80-72-60-(new die)-48-40-32-(new die)-28-24-20 sounds like a likely lineup to me, which gives us everything down to a 5500 non-XT, with the possibility of 5400/5300 SKUs with disabled memory, lower clocks, etc.

Makaveli · Oct 30, 2020

Seen this yet?

RX 6800, 6800 XT und 6900 XT: AMD veröffentlicht weitere Benchmarks in 4K und WQHD

Nach der Präsentation der Radeon RX 6800, 6800 XT und 6900 XT hat AMD nun weitere Benchmarks der Grafikkarten in 4K und WQHD veröffentlicht.

www.computerbase.de

Fluffmeister · Oct 30, 2020

Makaveli said:
Seen this yet?

RX 6800, 6800 XT und 6900 XT: AMD veröffentlicht weitere Benchmarks in 4K und WQHD

Nach der Präsentation der Radeon RX 6800, 6800 XT und 6900 XT hat AMD nun weitere Benchmarks der Grafikkarten in 4K und WQHD veröffentlicht.

www.computerbase.de

Yeah not sure if it was posted but AMD up benchmarks with SAM enabled but no rage mode.

https://www.amd.com/en/gaming/graphics-gaming-benchmarks

Results chop and change a bit, but it gives an idea what to expect.

InVasMani · Oct 30, 2020

Valantar said:
I didn't say they would be cutting 2 off anything, I said they can cut any number as long as it's 2x something. I.e. 2, 4, 6, 8, 10, 12... Even numbered cuts only, in other words. Nor did I say anything about where they would be cut from - that is either decided by where on the die there are defects, or if there aren't any, whatever is convenient engineering-wise. To quote myself, this is my (very rough and entirely unsourced) guess for the Navi 2 lineup in terms of CUs:

I was injecting my thoughts on the 2CU situation or twin units whatever you wish to call them or abbreviate them. What I was saying is it's unlikely AMD would bother with a SKU that differentiates by as few as 2 of the CU's seems most probably it would be someplace between 6 to 12 between two different SKU's to me at this point. I do see AMD leaning toward cutting less CU's where possible though and charging a higher premium for better performance and CU count is probably greatly more important than the bandwidth with the current design it's needed to make full advantage of the bandwidth available. Much of what happens hinges on the infinity cache size and bus width in any future SKU's even outside VRAM that also change things a fair bit HBM2 with infinity cache for new SKU's with even more CU's is a real possible scenario to consider too even w/o changing the bus width that's tons of extra bandwidth and more CU's to go along with it and the HBM2 is more power friendly than the GDDR6 if I'm not mistaken along with occupying less space so a bigger chip is rather tangible though I don't know about the yields of that. That said they could do 3SKU's lower initially then try to build a bigger higher CU count chip with HBM2 in that order to maximize the yields situation because TSMC's node will continue to mature more over time. The cost factor would be the concern with HBM2, but it would be better power, bandwidth, and space savings.

Fluffmeister said:
Yeah not sure if it was posted but AMD up benchmarks with SAM enabled but no rage mode.

https://www.amd.com/en/gaming/graphics-gaming-benchmarks

Results chop and change a bit, but it gives an idea what to expect.

That's quite interesting once you drop from 4K to 1440p RNDA2 performance pulls ahead rapidly relative to Ampere. I'd really like to see AMD add 1080p results to this list of benchmarks. The infinity cache seems to really flex it's benefit the most at lower resolutions in perticular which makes sense given the limited amount of cache to work with and huge latency reduction and bandwidth increase it provides better mileage of it naturally. It's actually very much akin to the Intel situation at 1080p so long for eleague high refresh rate gaming. I presume these cards are going to sell like hot cakes to that crowd of users because these cards will scream along nicely at 1080p high refresh rate far as I'm seeing relative to the cost. It'll be interesting to see what happens with RTRT at different resolutions. That infinity cache seems really well effective at lower resolutions.

Zach_01 · Oct 30, 2020

InVasMani said:
That's quite interesting once you drop from 4K to 1440p RNDA2 performance pulls ahead rapidly relative to Ampere. I'd really like to see AMD add 1080p results to this list of benchmarks. The infinity cache seems to really flex it's benefit the most at lower resolutions in perticular which makes sense given the limited amount of cache to work with and huge latency reduction and bandwidth increase it provides better mileage of it naturally. It's actually very much akin to the Intel situation at 1080p so long for eleague high refresh rate gaming. I presume these cards are going to sell like hot cakes to that crowd of users because these cards will scream along nicely at 1080p high refresh rate far as I'm seeing relative to the cost.

If you think AMDs latest performance across resolutions relatively to Ampere seems that it doesn’t do well on the higher/highest.

It’s not really that RDNA2 architecture/IC doesn’t scale well on different resolutions. Or that it does better at lower ones. It’s the Ampere architecture that doesn’t scale well across resolutions.
And you can see that from benchmarks comparing Turing vs Ampere. Turing and RDNA2 have a more “normal” scaling across the 3 well known 1080p, 1440p and 4K.

Seeing benchmarks of Turing vs Ampere across the 3 res you can identify that as you going up Ampere is getting away from Turing to reach the avg relative perf gains of around 30% on 4K. But on 1080p that difference is “only” 20%.
It’s a matter of Ampere’s architecture.

Also, this relative comparison (we don’t actually have full benches between Turing and RDNA2) short of confirms that AMD’s IC with the high (effective) bandwidth is working well and delivers its promises as a real wide bus.

Valantar · Oct 30, 2020

Zach_01 said:
If you think AMDs latest performance across resolutions relatively to Ampere seems that it doesn’t do well on the higher/highest.

It’s not really that RDNA2 architecture/IC doesn’t scale well on different resolutions. Or that it does better at lower ones. It’s the Ampere architecture that doesn’t scale well across resolutions.
And you can see that from benchmarks comparing Turing vs Ampere. Turing and RDNA2 have a more “normal” scaling across the 3 well known 1080p, 1440p and 4K.

Seeing benchmarks of Turing vs Ampere across the 3 res you can identify that as you going up Ampere is getting away from Turing to reach the avg relative perf gains of around 30% on 4K. But on 1080p that difference is “only” 20%.
It’s a matter of Ampere’s architecture.

Also, this relative comparison (we don’t actually have full benches between Turing and RDNA2) short of confirms that AMD’s IC with the high (effective) bandwidth is working well and delivers its promises as a real wide bus.

AFAIK that is mainly because it's only at 4k (and higher) that you can make any real use of the increased FP32 of Ampere, while at lower resolutions you're bottlenecked by other parts of the arch (which weren't doubled).

InVasMani · Oct 30, 2020

I'll assume you're probably right about Ampere, but far as the resolution scaling is concerned for RDNA2 1080p will be better use of the bandwidth available than 4K more frames for the same amount of bandwidth assuming the CPU can keep pace and the GPU's CU's can keep all that bandwidth availability fed well enough. All I know is RDNA2 relative to Ampere the scaling on RDNA2 did noticeably better when the resolution scaled down from 4K to 1440p and I suspect that follows thru to 1080p as well because it wasn't like a anomaly from the looks of it at all it was across all the tests the gaps narrows or RNDA2 pulls ahead or pulled away even further. You might be right about Ampere, but the infinity cache could be playing a role on top of that much like a SSD with overprovisioning at a lower resolution you'll have more infinity cache overprovisioning to work with so to speak.

Zach_01 · Oct 30, 2020

I guess this “issue” will be cleared as benchmarks will go public with all architectures in them on all resolutions.

InVasMani · Oct 30, 2020

I'm confusing myself trying to think about it now honestly. I get what you're saying about Ampere, but at the same time the infinity cache is drastically better on bandwidth and I/O. At lower resolution it could come into play more in terms of being readily obvious to the frame rate impact over a given time frame if the CPU/GPU's other requirements and needs can still lift their weight in accordance as well. I need to see a clearer picture of what's happening and understanding of why. I'm sure "Tech Jesus" at Gamer's Nexus will explain it all in over-provisioned deep analysis.

Valantar said:
AFAIK that is mainly because it's only at 4k (and higher) that you can make any real use of the increased FP32 of Ampere, while at lower resolutions you're bottlenecked by other parts of the arch (which weren't doubled).

Honeslty while contributing perhaps for certain the infinity cache works a 2.17x bandwidth increase with a 108.5% I/O improvement or 54.25% reduced latency in essence which more pronounced than adjusting for more FP32 workloads rather than FP16 for example. I think the Ampere aspect comes into play as well, but perhaps the infinity cache is the bigger element unless I'm way off basis on my assessment of the situation.

Valantar · Oct 30, 2020

InVasMani said:
Honeslty while contributing perhaps for certain the infinity cache works a 2.17x bandwidth increase with a 58.5% I/O latency reduction in essence which more pronounced than adjusting for more FP32 workloads rather than FP16 for example. I think the Ampere aspect comes into play as well, but perhaps the infinity cache is the bigger element unless I'm way off basis on my assessment of the situation.

I was only speaking of how Ampere scales in comparison to Turing. Comparing how a so far unreleased architecture with a never before seen feature scales to how two other architectures scale ... that's impossible. We know that Ampere does relatively better at 4k than lower resolutions. From what we've seen from AMD so far, the same is not true for RDNA 2 - it seems to scale much more traditionally. But we can't know anything for sure until we have reviews in. Still, AMD's 1440p numbers looks quite a lot better when compared to Ampere than their 4k ones do.

Zach_01 · Oct 30, 2020

We sure need a more technical explanation and approach to this new thing. I’m also interested in the more technical parts and details of any technology that comes.

For my simple non-technical (let alone professional) understanding and explanation, I’m thinking that if IC is truly delivering wide bandwidth (800+bit effective) across different workload levels (up to 4K that is more common than 8K) and scale well across them then the real bottleneck for any better performance is, as you also stated indirectly or not, the cores of the GPU and its surrounding I/O. And if really true they’ve manage to remove bandwidth bottleneck completely, up to 4K at least.

It’s radical! But also not a discovery of the wheel. I can’t think that nVidia’s engineers haven’t think of such implementation. But I can compare nVidia’s approach to the one of Intel. AMD has done steps to CPU world for a unified arch with chiplets that scale really well from just 1 to a large number of them. With its cons.

Intel does not do that but rather was always betting on a more strong arch in its core but couldn’t scale well beyond a number. Today’s nVidia’s approach is doing the same on reverse. It performs better on heavy workloads but does not scale well on lighter ones.

nVidia can’t implement such large cache because doesn’t have room for it in its arch, occupied by Tensor and RT cores. That’s why they need the super high speed 6X VRAM to keep up feeding the cuda cores with data.
In a far edged sense, you can say that AMDs arch (both CPU/GPU) is more of opened sourced and nVidia’s more closed and proprietary. Also RDNA in general is a more of a gaming approach and Ampere(starting with Turing) is more of a work load one that can do well in other loads than gaming, like CGN that was really strong outside gaming.

Rumors say that the next RDNA3 will be more close to ZEN2/3 approach. Chunks of cores/dies tied together with large pools of cache.
That’s why I believe it will not come soon. It will be way more than a year.

Camm · Oct 30, 2020

Zach_01 said:
For my simple non-technical (let alone professional) understanding and explanation, I’m thinking that if IC is truly delivering wide bandwidth (800+bit effective) across different workload levels (up to 4K that is more common than 8K) and scale well across them then the real bottleneck for any better performance is, as you also stated indirectly or not, the cores of the GPU and its surrounding I/O. And if really true they’ve manage to remove bandwidth bottleneck completely, up to 4K at least.

Okay, people tend to think of bandwidth as a constant thing (I'm always pushing 18Gbps or whatever the hell it is) at all times, and that if I'm not pushing the most amount of data at all times the GPU is going to stall.

The reality is only a small subset of data is all that necessary to keeping the GPU fed to not stall. The majority of the data (in a gaming context anyway) isn't anywhere near as latency sensitive and can be much more flexible for when it comes across the bus. IC helps by doing two things. It
A: Stops writes and subsequent retrievals from going back out to general memory for the majority of that data (letting it exist in cache, where its likely a shader is going to retrieve that information from again), and
B: It helps act as a buffer for further deprioritising data retrieval, letting likely needed data be retrieved earlier, momentarily held in cache, then ingested to the shader pipeline than written back out to VRAM.

As for Nvidia, yep, they would have, but the amount of die space being chewed for even 128mb of cache is pretty ludicrously large. AMD has balls chasing such a strategy tbh (but is probably why we saw 384 bit Engineering Sample cards earlier in the year, if IC didn't perform, they could fall back to a wider bus).

mtcn77 · Oct 30, 2020

InVasMani said:
I'm confusing myself trying to think about it now honestly. I get what you're saying about Ampere, but at the same time the infinity cache is drastically better on bandwidth and I/O. At lower resolution it could come into play more in terms of being readily obvious to the frame rate impact over a given time frame if the CPU/GPU's other requirements and needs can still lift their weight in accordance as well. I need to see a clearer picture of what's happening and understanding of why. I'm sure "Tech Jesus" at Gamer's Nexus will explain it all in over-provisioned deep analysis.

Honeslty while contributing perhaps for certain the infinity cache works a 2.17x bandwidth increase with a 108.5% I/O improvement or 54.25% reduced latency in essence which more pronounced than adjusting for more FP32 workloads rather than FP16 for example. I think the Ampere aspect comes into play as well, but perhaps the infinity cache is the bigger element unless I'm way off basis on my assessment of the situation.

I think this also encapsulates the gist of it somewhat.
Prior to this, AMD struggled with instruction pipeline functions. Successively, they streamlined the pipeline operation flow, dropped instruction latency to 1 and started implementing dual issued operations. That, or I don't know how they can increase shader speed by 7.9x folds implementing simple progressions to the same architecture.

Camm said:
As for Nvidia, yep, they would have, but the amount of die space being chewed for even 128mb of cache is pretty ludicrously large. AMD has balls chasing such a strategy tbh (but is probably why we saw 384 bit Engineering Sample cards earlier in the year, if IF didn't perform, they could fall back to a wider bus).

And remember, this is only because they had previously experimented with it, otherwise there would be no chance that they know first hand how much power budget it would cost them. Sram has a narrow efficiency window.
There used to be a past notice which compared AMD and Intel's cell to transistor ratios, with the summary being AMD had integrated higher and more efficient transistor count units. All because of available die space.

Dave65 · Oct 30, 2020

In case anyone missed it. :roll:

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	The Expanse
Processor	AMD Ryzen 7 9800X3D
Motherboard	Asus Prime X670E-Pro Wifi BIOS 3222 AGESA PI 1.2.0.3a
Cooling	Corsair H150i Elite LCD XT
Memory	64GB G.SKILL Trident Z5 Neo RGB DDR5 6000 CL 30-40-40-96 1T
Video Card(s)	XFX Radeon RX 7900 XTX Magnetic Air (25.3.1)
Storage	WD SN850X 2TB / Corsair MP600 1TB / Samsung 860Evo 1TB x2 Raid 0 / Asus NAS AS1004T V2 20TB
Display(s)	LG 34GP83A-B 34 Inch 21: 9 UltraGear Curved QHD (3440 x 1440) 1ms Nano IPS 160Hz
Case	Fractal Design Meshify S2
Audio Device(s)	Creative X-Fi + Logitech Z-5500 + HS80 Wireless
Power Supply	Corsair AX850 Titanium
Mouse	Corsair Dark Core RGB SE
Keyboard	Corsair K100
Software	Windows 10 Pro x64 22H2
Benchmark Scores	https://valid.x86.fr/0412jp https://browser.geekbench.com/v6/cpu/11073923

Processor	AMD Ryzen 7 5700X3D
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	Thermalright Peerless Assassin 120 SE
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	Sapphire AMD Radeon RX 9070 XT NITRO+
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Hisense 55" U7K 4K@144Hz
Case	Thermaltake Ceres 500 TG ARGB
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

System Name	PC on since March 2025, upgraded from 5900X
Processor	Ryzen 7 9700X (March 2025), 150W PPT limit, 85C temp limit, CO -25, +100MHz (up to 5.65GHz)
Motherboard	Asrock X870E NOVA, BIOS v3.2, AGESA PI 1.2.0.3a Patch A
Cooling	Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryosheet
Memory	2x32GB G.Skill Trident Z5 RGB (March2025) 6200MT/s 1.42V CL30-36-36-36-28-64 1T, tRFC:500, Hynix A-D
Video Card(s)	Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (370W current) PowerLimit, 1060mV, Adrenalin v25.3.1
Storage	NVMe: 990Pro 2TB(OS 25), 980Pro 1TB(22), 970Pro 512(19) / S-III: 850Pro 1TB(15) 860Evo 1TB(20)
Display(s)	Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case	Thermaltake Core P8 TG Gaming Full Tower, Fans: 9x140mm + 3x120mm
Audio Device(s)	Astro A50 headset
Power Supply	Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse	Logitech MX Master (Gen1)
Keyboard	Logitech G15 (Gen2) w/ LCDSirReal applet
Software	Windows 11 Home 64bit (v24H2, OSBuild 26100.3775), Install March 2025

System Name	ATHENA
Processor	AMD 7950X
Motherboard	ASUS Crosshair X670E Extreme
Cooling	ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory	2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s)	ASUS 4090 STRIX
Storage	3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s)	Acer X38S, Wacom Cintiq Pro 15
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply	Seasonic PRIME TX-1600
Mouse	Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard	Logitech G915 TKL
VR HMD	Oculus Quest 2
Software	Windows 11 + Universal Blue

System Name	Daves
Processor	AMD Ryzen 3900x
Motherboard	AsRock X570 Taichi
Cooling	Enermax LIQMAX III 360
Memory	32 GiG Team Group B Die 3600
Video Card(s)	Powercolor 5700 xt Red Devil
Storage	Crucial MX 500 SSD and Intel P660 NVME 2TB for games
Display(s)	Acer 144htz 27in. 2560x1440
Case	Phanteks P600S
Audio Device(s)	N/A
Power Supply	Corsair RM 750
Mouse	EVGA
Keyboard	Corsair Strafe
Software	Windows 10 Pro