Tuesday, October 6th 2020
AMD Big Navi GPU Features Infinity Cache?
As we are nearing the launch of AMD's highly hyped, next-generation RDNA 2 GPU codenamed "Big Navi", we are seeing more details emerge and crawl their way to us. We already got some rumors suggesting that this card is supposedly going to be called AMD Radeon RX 6900 and it is going to be AMD's top offering. Using a 256-bit bus with 16 GB of GDDR6 memory, the GPU will not use any type of HBM memory, which has historically been rather pricey. Instead, it looks like AMD will compensate for a smaller bus with a new technology it has developed. Thanks to the new findings on Justia Trademarks website by @momomo_us, we have information about the alleged "infinity cache" technology the new GPU uses.
It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
Source:
VideoCardz
It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
141 Comments on AMD Big Navi GPU Features Infinity Cache?
None of it was a serious move towards anything with a future, it was clearly grasping at straws as Hawaii XT already ran into the limits of what GCN could push ahead. They had a memory efficiency issue. Nvidia eclipsed that entirely with the release of Maxwell's Delta Compression tech which AMD at the time didn't have. Polaris didn't either so its questionable what use that 'update' really was. All Polaris really was, was a shrink from 22 > 14nm and an attempt to get some semblance of a cost effective GPU in the midrange. Other development was stalled and redirected to more compute (Vega) and pro markets because 'that's where the money is', while similarly the midrange 'is where the money is'. Then came mining... and it drove 90% of Polaris sales, I reckon. People still bought 1060's and 970's regardless, not in the least because those were actually available.
Current trend in GPUs... Jon Peddies reports year over year a steady growth (relatively) wrt high end GPUs and the average price is steadily rising. Its a strange question to ask me what undisclosed facts RDNA2 will bring to change the current state of things, but its a bit of a stretch to 'assume' they will suddenly leap ahead as some predict. The supposed specs we DO have, show about 500GB/s in bandwidth and that is a pretty hard limit, and apparently they do have some sort of cache system that does something for that as well, seeing the results. If the GPU we saw in AMD"s benches was the 500GB/s one, the cache is good for another 20%. Nice. But it still won't eclipse a 3080. This means they will need a wider bus for anything bigger; and this will in turn take a toll on TDPs and efficiency.
The first numbers are in and we've already seen about a 10% deficit to the 3080 with whatever that was supposed to be. There is probably some tier above it, but I reckon it will be minor like the 3090 above a 3080 is. As for right decisions... yes, retargeting the high end is a good decision, its the ONLY decision really and I hope they can make it happen, but the track record for RDNA so far isn't spotless, if not just plagued with very similar problems to what GCN had, up until now.
@gruffi sry, big ninja edit, I think you deserved it for pressing the question after all :)
Better waste this energy with more meaningful discussions.
Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To
Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).
Your second statement is the worst type of misleading: something that is technically true, but is presented in a way that vastly understates the importance of context, rendering its truthfulness moot. "They're fine if they're understood as what they are" is entirely the point here: FP32 is in no way whatsoever a meaningful measure of consumer GPU performance across architectures. Is it a reasonable point of comparison within the same architecture? Kind of! For non-consumer uses, where pure FP32 compute is actually relevant? Sure (though it is still highly dependent on the workload). But for the vast majority of end users, let alone the people on these forums, FP32 as a measure of the performance of a GPU is very, very misleading.
Just as an example, here's a selection of GPUs and their game performance/Tflop in TPU's test suite at 1440p from the 3090 Strix OC review:
Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF
Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF
RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF
RX 5500 XT 27% 5.2TF = 5.2 perf/TF
GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF
Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF
This is of course at just one resolution, and the numbers would change at other resolutions. The point still shines through: even within the same architectures, using the same memory technology, gaming performance per teraflop of FP32 compute can vary by 25% or more. Across architectures we see more than 100% variance. Which demonstrates that for the average user, FP32 is an utterly meaningless metric. Going by these numbers, a 20TF GPU might beat the 3090 (if it matched the 2060 in performance/TF) or it might lag dramatically (like the VII or Ampere).
Unless you are a server admin or researcher or whatever else running workloads that are mostly FP32, using FP32 as a meaningful measure of performance is very misleading. Its use is very similar to how camera manufacturers have used (and partially still do) megapixels as a stand-in tech spec to represent image quality. There is some relation between the two, but it is wildly complex and inherently non-linear, making the one meaningless as a metric for the other.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.
Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)
Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)
RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz
GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)
Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)
I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.
- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.
As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?
For me the most important is that the card seems to be power efficient and thats more important than the power hungry , super heater for your home nvidia solution. Imagine living in spain and italy with temps above 40c and then not being able to play a game because your silly machine gets overheated by your o so precious nvidia card :) This quote " Something Big is coming is not a lie" because its going to be a big card, they have not said anything about performance the only thing they talk about is a more efficient product. That most people translate that to faster than nvidia is their own vision.
But if it does beat the 3070 then i will consider buying it even though its not such a big step upwards from my current 5700XT which runs darn well.
I really wish that they introduce the AMD Quantum mini pc which was showed at the E3 2015 with current hardware or something similar.
Because i want my systems to be smaller without having to limit the performance too much, i am pretty sure the current hardware could be more than capable to create such a mini pc by now with enough performance.