AMD Big Navi GPU Features Infinity Cache?

Zach_01 · Oct 11, 2020

bug said:
Based on the fact that no architecture is built for a single generation. And it's in the name RDNA2.

Architectures change, evolve, enhance and modify... And we dont really know what AMD done this round.

ZEN3 architecture is all ZEN... Started with ZEN >> ZEN+ >>ZEN2 continuously improving and yet again they manage to enhance it on the exact same node and improve IPC and performance per watt all together. RDNA2 is just a step back (=ZEN2) and it will bring impovements. RDNA3 will prabably be like a ZEN3 iteration.

gruffi · Oct 11, 2020

Valantar said:
You are presenting an argument from a point of view where the "strength" of a GPU architecture is apparently only a product of its FP32 compute prowess.

No. I never said that's the "only" factor. But it's very common to express the capability of such chips in FLOPS. AMD does it, Nvidia does it, every supercomputer does. You claimed I was off. And that's simply wrong. We should all know that actual performance depends on other factors as well, like workload or efficiency.

Valantar said:
Put more simply: you are effectively saying "GCN was a good architecture, but bad at gaming"

No. I said what I said. I never categorized anything as good or bad. That was just you. But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and ~~Polaris~~ Pascal. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.

Vayra86 said:
Read back or on other topics, been over this at length already.

How about answering my question first? I'm still missing that one.

gruffi said:
Okay, then how about some facts. You said RDNA 2 won't fight the 3080 with that bandwidth. Give us some facts about RDNA 2 why it won't happen. No opinions, no referring to old GCN stuff, just hard facts about RDNA.

Vayra86 · Oct 11, 2020

gruffi said:
How about answering my question first? I'm still missing that one.

That was the answer to your question

gruffi said:
No. I said what I said. I never categorized anything as good or bad. That was just you. But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.

First update was Tonga (R9 285 was it?) and it failed miserably, then they tried Fury X. Then came Polaris.

None of it was a serious move towards anything with a future, it was clearly grasping at straws as Hawaii XT already ran into the limits of what GCN could push ahead. They had a memory efficiency issue. Nvidia eclipsed that entirely with the release of Maxwell's Delta Compression tech which AMD at the time didn't have. Polaris didn't either so its questionable what use that 'update' really was. All Polaris really was, was a shrink from 22 > 14nm and an attempt to get some semblance of a cost effective GPU in the midrange. Other development was stalled and redirected to more compute (Vega) and pro markets because 'that's where the money is', while similarly the midrange 'is where the money is'. Then came mining... and it drove 90% of Polaris sales, I reckon. People still bought 1060's and 970's regardless, not in the least because those were actually available.

Current trend in GPUs... Jon Peddies reports year over year a steady growth (relatively) wrt high end GPUs and the average price is steadily rising. Its a strange question to ask me what undisclosed facts RDNA2 will bring to change the current state of things, but its a bit of a stretch to 'assume' they will suddenly leap ahead as some predict. The supposed specs we DO have, show about 500GB/s in bandwidth and that is a pretty hard limit, and apparently they do have some sort of cache system that does something for that as well, seeing the results. If the GPU we saw in AMD"s benches was the 500GB/s one, the cache is good for another 20%. Nice. But it still won't eclipse a 3080. This means they will need a wider bus for anything bigger; and this will in turn take a toll on TDPs and efficiency.

The first numbers are in and we've already seen about a 10% deficit to the 3080 with whatever that was supposed to be. There is probably some tier above it, but I reckon it will be minor like the 3090 above a 3080 is. As for right decisions... yes, retargeting the high end is a good decision, its the ONLY decision really and I hope they can make it happen, but the track record for RDNA so far isn't spotless, if not just plagued with very similar problems to what GCN had, up until now.

@gruffi sry, big ninja edit, I think you deserved it for pressing the question after all

DeathtoGnomes · Oct 12, 2020

Valantar said:
This is a forum for computer enthusiasts.

stop spreading false rumors!!

InVasMani · Oct 12, 2020

Now here I thought this was a forum to learn about basket weaving 101 OOP!

Valantar · Oct 12, 2020

gruffi said:
No. I never said that's the "only" factor. But it's very common to express the capability of such chips in FLOPS. AMD does it, Nvidia does it, every supercomputer does. You claimed I was off. And that's simply wrong. We should all know that actual performance depends on other factors as well, like workload or efficiency.

And camera manufacturers still market megapixels as if it's a meaningful indication of image quality. Should we accept misleading marketing terms just because they are common? Obviously not. The problems with using teraflops as an indicator of consumer GPU performance have been discussed at length both in forums like these and the media. As for supercomputers: that's one of the relatively few cases where teraflops actually matter, as supercomputers run complex compute workloads. Though arguably FP64 is likely more important to them than FP32. But for anyone outside of a datacenter? There are far more important metrics than the base teraflops of FP32 that a GPU can deliver.

gruffi said:
No. I said what I said. I never categorized anything as good or bad. That was just you.

Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.

gruffi said:
But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.

I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN. It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall. It could be said to be a good architecture for compute, hence its use as the basis for CDNA, but for more general workloads it simply scales poorly.

DeathtoGnomes said:
stop spreading false rumors!!

Sorry, my bad. I should have said "This is a forum for RGB enthusiasts."

BoboOOZ · Oct 12, 2020

Valantar said:
Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.

Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
Better waste this energy with more meaningful discussions.

Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To

Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).

Valantar · Oct 12, 2020

BoboOOZ said:
Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
Better waste this energy with more meaningful discussions.

Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To

Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).

Sorry, I might be pedantic, but I can't agree with this. Firstly, the meaning of "strong" is obviously dependent on context, and in this context (consumer gaming GPUs) the major relevant form of "strength" is gaming performance. Attributing FP32 compute performance as a more relevant reading of "strong" in a consumer GPU lineup needs some actual arguments to back it up. I have so far not seen a single one.

Your second statement is the worst type of misleading: something that is technically true, but is presented in a way that vastly understates the importance of context, rendering its truthfulness moot. "They're fine if they're understood as what they are" is entirely the point here: FP32 is in no way whatsoever a meaningful measure of consumer GPU performance across architectures. Is it a reasonable point of comparison within the same architecture? Kind of! For non-consumer uses, where pure FP32 compute is actually relevant? Sure (though it is still highly dependent on the workload). But for the vast majority of end users, let alone the people on these forums, FP32 as a measure of the performance of a GPU is very, very misleading.

Just as an example, here's a selection of GPUs and their game performance/Tflop in TPU's test suite at 1440p from the 3090 Strix OC review:

Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF

Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF

RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF
RX 5500 XT 27% 5.2TF = 5.2 perf/TF

GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF

Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF

This is of course at just one resolution, and the numbers would change at other resolutions. The point still shines through: even within the same architectures, using the same memory technology, gaming performance per teraflop of FP32 compute can vary by 25% or more. Across architectures we see more than 100% variance. Which demonstrates that for the average user, FP32 is an utterly meaningless metric. Going by these numbers, a 20TF GPU might beat the 3090 (if it matched the 2060 in performance/TF) or it might lag dramatically (like the VII or Ampere).

Unless you are a server admin or researcher or whatever else running workloads that are mostly FP32, using FP32 as a meaningful measure of performance is very misleading. Its use is very similar to how camera manufacturers have used (and partially still do) megapixels as a stand-in tech spec to represent image quality. There is some relation between the two, but it is wildly complex and inherently non-linear, making the one meaningless as a metric for the other.

gruffi · Oct 12, 2020

Valantar said:
Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.

I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².

Valantar said:
I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN.

You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.

Valantar said:
It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall.

I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.

Vayra86 said:
That was the answer to your question

Okay. Than we can agree that it could be possible to be competitive or at least very close in performance even with less memory bandwidth?

Vayra86 · Oct 12, 2020

gruffi said:
Okay. Than we can agree that it could be possible to be competitive or at least very close in performance even with less memory bandwidth?

Could as in highly unlikely, yes.

londiste · Oct 12, 2020

Valantar said:
Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF

Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF

RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF
RX 5500 XT 27% 5.2TF = 5.2 perf/TF

GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF

Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF

At 1440p Ampere probably gets more of a hit than it should.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.

Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)

Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)

RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz

GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)

Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)

I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.

Code:

GPU        TFLOP 1440p %/TF  2160p %/TF
3090       40.3  100%  2.48  100%  2.48
3080       33.6   90%  2.68   84%  2.5

2080Ti     15.9   72%  4.53   64%  4.02
2070S       9.2   55%  5.98   46%  5.00
2060        7.1   41%  5.77   34%  4.79

RX5700XT    9.66  51%  5.28   42%  4.35
RX5600XT*   9.0   40%  5.00   33%  4.12
RX5500XT    5.1   27%  5.29   19%  3.72

Radeon VII 13.6   53%  3.90   46%  3.38
Vega64*    12.7   41%  3.23   34%  2.68
RX590       7.1   29%  4.08   24%  3.38

1080Ti     12.7   53%  4.17   45%  3.54
1070        6.9   34%  4.93   28%  4.06

- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.

As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?

Valantar · Oct 12, 2020

gruffi said:
I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².

Skipping the hilarity of (unintentionally, I presume) suggesting that reading without interpretation is possible, I have explained the reasons for my interpretation at length, and why it to me is a much more reasonable assumption of what a "strong" GPU architecture means in the consumer space. You clearly worded your statement vaguely and ended up saying something different from what you meant. (For the record: even calling FP32 "raw performance" is a stretch - it's the main performance metric of modern GPUs, but still one among at least a couple dozen relevant ones, all of which affect various workloads in different ways. Hence me arguing for why it alone is a poor indication of anything except performance in pure FP32 workloads. It's kind of like discussing which minivan is the best solely based on engine horsepower, while ignoring the number and quality of seats, doors, build quality, reliability, ride comfort, etc.) You're welcome to disagree with this, but so far your arguments for your side of this discussion have been unconvincing at best.

gruffi said:
You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.

Uhm ... updating the ISA is a change to the architecture. Beyond that, AMD kept talking about various low-level architectural changes to GCN for each revision - beyond what is published; after all, published information doesn't really go beyond block diagram levels - but these never really materialized as performance or efficiency improvements. You're right that the move to HBM didn't change the architecture, as the memory controllers generally aren't seen as part of the GPU architecture. Of course the main bottleneck for GCN was its 64 CU limit, which forced AMD to release the V64 at idiotic clocks to even remotely compete in absolute performance, but made the architecture look terrible for efficiency at the same time. A low-clocked Vega 64 is actually quite efficient, after all, and show that if AMD could have made a medium-clocked 80CU Vega card, they could have been in a much better competitive position (though at some cost due to the large die). That limitation alone is likely both the main reason for AMD's GPU woes and their choice of replacing GCN entirely - they had no other choice. But even with limited resources, they had more than half a decade to improve GCN architecturally, and managed pretty much nothing. Luckily with RDNA they've both removed the 64 CU limit and improved perf/TF dramatically, with promises of more to come.

gruffi said:
I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.

That's possible, but unlikely. The enterprise compute market is of course massively lucrative, but AMD didn't design GCN as a datacenter compute-first core. It was a graphics core design meant to replace VLIW, but it also happened to be very good at pure FP32. Call it a lucky side effect. At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market. RDNA is clearly a gaming-oriented architecture, with CDNA being split off (and reportedly being much closer to GCN in design) for compute work, but that doesn't mean that GCN wasn't initially designed for gaming.

londiste said:
At 1440p Ampere probably gets more of a hit than it should.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.

Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)

Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)

RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz

GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)

Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)

I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.

Code:

GPU TFLOP 1440p %/TF 2160p %/TF 3090 40.3 100% 2.48 100% 2.48 3080 33.6 90% 2.68 84% 2.5 2080Ti 15.9 72% 4.53 64% 4.02 2070S 9.2 55% 5.98 46% 5.00 2060 7.1 41% 5.77 34% 4.79 RX5700XT 9.66 51% 5.28 42% 4.35 RX5600XT* 9.0 40% 5.00 33% 4.12 RX5500XT 5.1 27% 5.29 19% 3.72 Radeon VII 13.6 53% 3.90 46% 3.38 Vega64* 12.7 41% 3.23 34% 2.68 RX590 7.1 29% 4.08 24% 3.38 1080Ti 12.7 53% 4.17 45% 3.54 1070 6.9 34% 4.93 28% 4.06

- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.

As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?

I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.

londiste · Oct 12, 2020

Valantar said:
I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.

Variations are probably down to relative amount of other aspects of the card - memory bandwidth, TMUs, ROPs. Trying not to go down that rabbit hole right now. It didn't take me quite 45 minutes to put that one together but it wasn't too far off

gruffi · Oct 13, 2020

Valantar said:
You clearly worded your statement vaguely

I was very clear about that I was just talking about raw performance as a simple fact. And not if something is considered to be good or bad based on that. Maybe next time if you are unsure about the meaning of other's words ask first to clear things up.

Your aggressive and bossy answers to put words in my mouth I never said is a very impolite and immature way of having a conversation.

Valantar said:
updating the ISA is a change to the architecture.

But it doesn't make the architecture faster or more efficient considering general performance. That's the important point. Or do you think adding AVX512 to Comet Lake would make it better for your daily tasks? Not at all.

Valantar said:
AMD didn't design GCN as a datacenter compute-first core. It was a graphics core design

In fact it was no graphics core design. Look up the press material that AMD published back then. You can read statements like "efficient and scalable architecture optimized for graphics and parallel compute" or "cutting-edge gaming and compute performance". GCN clearly was designed as a hybrid, an architecture to be equally good at gaming and compute. But I think the focus was more on improving compute performance. Because that was the philosophy of the AMD staff at that time. They wanted to be more competitive in professional markets. Bulldozer was designed with the same focus in mind. Lisa Su changed that. Nowadays AMD focuses more on client markets again.

Valantar said:
but it also happened to be very good at pure FP32. Call it a lucky side effect.

That naivety is almost funny. Nothing happens as a side effect during years of development. It was on purpose.

Valantar said:
At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market.

AMD had GPGPU software solutions before OpenCL and even before CUDA. The first was the CTM (Close To The Metal) interface. Later it was replaced by the Stream SDK. All those developments happened in the later 00's. Likely when GCN was in its design phase. It's obvious that AMD wanted a performant compute architecture to be prepared for future GPGPU environments. That doesn't mean GCN was a compute only architecture. Again, I didn't say that. But compute performance seemed to be at least as equally important as graphics performance.

InVasMani · Oct 14, 2020

Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.

londiste · Oct 14, 2020

InVasMani said:
Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.

FPGA is incredibly inefficient compared to fixed hardware.

Bronan · Oct 16, 2020

I actually do not care at all about all the hyped news
For me the most important is that the card seems to be power efficient and thats more important than the power hungry , super heater for your home nvidia solution. Imagine living in spain and italy with temps above 40c and then not being able to play a game because your silly machine gets overheated by your o so precious nvidia card

bug said:
Ok, who the hell calls Navi2 "Big Navi"?
Big Navi was a pipe dream of AMD loyalists left wanting for a first gen Navi high-end card.

This quote " Something Big is coming is not a lie" because its going to be a big card, they have not said anything about performance the only thing they talk about is a more efficient product. That most people translate that to faster than nvidia is their own vision.
But if it does beat the 3070 then i will consider buying it even though its not such a big step upwards from my current 5700XT which runs darn well.

I really wish that they introduce the AMD Quantum mini pc which was showed at the E3 2015 with current hardware or something similar.
Because i want my systems to be smaller without having to limit the performance too much, i am pretty sure the current hardware could be more than capable to create such a mini pc by now with enough performance.

System Name	PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor	Ryzen 9 5900X (July 2022), 220W PPT limit, 80C temp limit, CO -6-14, +50MHz (up to 5.0GHz)
Motherboard	Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling	Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory	2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3667MT/s 1.42V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s)	Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (375W current) PowerLimit, 1060mV, Adrenalin v24.10.1
Storage	Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s)	Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case	None... naked on desk
Audio Device(s)	Astro A50 headset
Power Supply	Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse	Logitech MX Master (Gen1)
Keyboard	Logitech G15 (Gen2) w/ LCDSirReal applet
Software	Windows 11 Home 64bit (v24H2, OSBuild 26100.2161), upgraded from Win10 to Win11 on Jan 2024

Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	Thermalright Peerless Assassin
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	LazyOldMan
Processor	9900KF
Motherboard	Asrock z390 Taichi Ultimate
Cooling	Corsair custom Watercooled
Memory	64 Gb
Video Card(s)	RX 6800 XT
Storage	Too much to mention in all 980 TB
Display(s)	2 x Dell 4K @ 60 hz
Case	Crap case
Audio Device(s)	Realtek + Bayer Dynamics 990 Pro headset
Power Supply	1300 watt
Mouse	Corsair cord mouse
Keyboard	Corsair red lighter cabled keyboard ages old ;)