Tuesday, October 6th 2020
AMD Big Navi GPU Features Infinity Cache?
As we are nearing the launch of AMD's highly hyped, next-generation RDNA 2 GPU codenamed "Big Navi", we are seeing more details emerge and crawl their way to us. We already got some rumors suggesting that this card is supposedly going to be called AMD Radeon RX 6900 and it is going to be AMD's top offering. Using a 256-bit bus with 16 GB of GDDR6 memory, the GPU will not use any type of HBM memory, which has historically been rather pricey. Instead, it looks like AMD will compensate for a smaller bus with a new technology it has developed. Thanks to the new findings on Justia Trademarks website by @momomo_us, we have information about the alleged "infinity cache" technology the new GPU uses.
It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
Source:
VideoCardz
It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
141 Comments on AMD Big Navi GPU Features Infinity Cache?
RT, where is it.
As for inflammatory... stupid.... time will tell won't it ;) Many times todays' flame in many beholders' eyes is tomorrows reality. Overhyping AMD's next best thing is not new and it never EVER paid off.
I'm sure you've watched DF's 5700XT vs X1X video, right?
We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.
Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.
What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhancements.
By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) aren't utilized.
If the series x was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.
As for the back-compat mode working as if it was GCN: AMD literally presented this when they presented RDNA1. It is by no means a console exclusive feature, it is simply down to how the GPU handles instructions. It's likely not entirely 1:1 as some low-level changes might carry over, but what AMD presented was essentially a mode where the GPU operates as if it was a GCN GPU. There's no reason to expect RDNA2 in consoles to behave differently. DF's review underscores this: That is about as explicit as you get it: compatibility mode essentially nullifies the IPC (or "performance per TFlop") improvements of RDNA compared to GCN. That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.
In any case, one thing I am pretty sure AMD will not do: pair a 526 sq mm RDNA2 die with a memory bandwidth starved configuration similar to that of the 5700XT, that would definitely be stupid, even based on the average TPU forumite level. Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Before you dismiss Coreteks' speculations, yes, I agree his speculations are more miss than hit, but this video is leak, not speculation.
Anyways. The new cards will be out soon enough and we'll have a better idea of how much of an improvement RDNA2 brings in terms of IPC. It will be most obvious when comparing the rumored 40CU Navi22 to the 5700XT at the same clocks.
As higher clocks directly increases the bandwidth of the caches.
But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.
IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.
The thing with cache is more is not always better. You can increase latency with larger cache and sometime doubling the cache do not means a significant gain in cache hit. That would end in just wasted silicon.
So the fact to me that they are implementing a new way to handle the L1 cache is to me much more promising than if they just doubled the L2 or something like that.
Note that big gain in performance will come from better cache and memory subsystem. We are starting to hit a wall there and getting data from fast memory just cost more and more power. If you can have your data to travel less, you save a lot of energy. Doing the actual computations doesn't require that much power, it's really moving the data around that increase the power consumption. So if you want an efficient architecture, you need to try to have your data to travel as less distance as possible.
But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.
If that doesn't happen, the good news is making a 256 bit architecture with a 250w tdp card cost much less than making a 350w tdp with larger bus card. AMD if they can't compete on pure performance, will be able to be very competitive on the pricing.
and in the end, that is what matter. I dont care if people buying 3090 spend too much, the card is just there for that. But i will be very happy if the next gen AMD cards increase the performance/cost in the 250-500$ range.
Historically, any video card having a memory bus wider than 256 bits has been expensive (not talking HBM here), that is what made 256 bits standard for so many generations. 320 bits requires too complicated a PCB and even more so 384 or 512 bits.
Even if it's GDDR6, with a small bus, big gains could be gained when going low latency GDDR6. If i call correct applying the Ubermix 3.1 timings onto a polaris (which is basicly a 1666Mhz strap / timings applied onto 2000Mhz memory) yielded better results then simply overclocking the memory.
It's all speculation; what matters is the card being at 3080 territory or above, and AMD has a winner. Simple as that.