# AMD Big Navi GPU Features Infinity Cache?



## AleksandarK (Oct 6, 2020)

As we are nearing the launch of AMD's highly hyped, next-generation RDNA 2 GPU codenamed "Big Navi", we are seeing more details emerge and crawl their way to us. We already got some rumors suggesting that this card is supposedly going to be called AMD Radeon RX 6900 and it is going to be AMD's top offering. Using a 256-bit bus with 16 GB of GDDR6 memory, the GPU will not use any type of HBM memory, which has historically been rather pricey. Instead, it looks like AMD will compensate for a smaller bus with a new technology it has developed. Thanks to the new findings on Justia Trademarks website by @momomo_us, we have information about the alleged "infinity cache" technology the new GPU uses.

It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.





*View at TechPowerUp Main Site*


----------



## JAB Creations (Oct 6, 2020)

I've been stuck on a 290X for a few years now and I can't wait to get the 6900XT or if they make the liquid cooled version 6900XTX. Now that AMD has beaten back the anti-capitalist crony Intel and made enough money to really push R&D:

The drivers are rumored to be solid for this release.
There will actually be stock because unlike Nvidia they're not trying to artificially drive up prices.
It's not going to be a watt-sucking heat-producing beast.
I'll finally stop running out of video memory (browsers use GPU memory).


----------



## okbuddy (Oct 6, 2020)

1gb cache = from 512bit to 128bit bw

wow

how about 6gb cache we could not need


----------



## Vayra86 (Oct 6, 2020)

Good comedy, this

Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.

History repeats.

Bandwidth is bandwidth and cache is not new. Also... elephant in the room.... Nvidia needed expanded L2 Cache since Turing to cater for their new shader setup with RT/tensor in them...yeah, I really wonder what magic Navi is going to have with a similar change in cache sizes... surely they won't copy over what Nvidia has done before them like they always have right?! Surely this isn't history repeating, right? Right?!







JAB Creations said:


> I've been stuck on a 290X for a few years now and I can't wait to get the 6900XT or if they make the liquid cooled version 6900XTX. Now that AMD has beaten back the anti-capitalist crony Intel and made enough money to really push R&D:
> 
> The drivers are rumored to be solid for this release.
> There will actually be stock because unlike Nvidia they're not trying to artificially drive up prices.
> ...



Let's revisit those assumptions post launch  That'll be fun, too. I'll take a bet... drivers will need hotfixing, which will likely come pretty late or creates new issues along the way (note: Nvidia has fallen prey to this just as well, this alone should say enough); things will be out of stock shortly after launch, its going to suck an easy 250-300W just as well, and yes, you do have 16GB on the top model.

If I'm wrong, I'll buy it


----------



## robb (Oct 6, 2020)

Vayra86 said:


> Good comedy, this
> 
> Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.
> 
> ...


 You have to be a special kind of stupid to think their top card will only match the 2080ti considering the 2080ti is 50% faster than the 5700xt. It does not take a genius to realize that doubling the cores of the 5700xt, increasing IPC, and running higher clocks would result in a MUCH higher gain than 50%. FFS even the XBOX series X has a gpu as fast or faster than the 2080 super and the 6900xt will be a hell of a lot bigger gpu.


----------



## Frick (Oct 6, 2020)

robb said:


> You have to be a special kind of stupid to think their top card will only match the 2080ti considering the 2080ti is 50% faster than the 5700xt. It does not take a genius to realize that doubling the cores of the 5700xt, increasing IPC, and running higher clocks would result in a MUCH higher gain than 50%. FFS even the XBOX series X has a gpu as fast or faster than the 2080 super and the 6900xt will be a hell of a lot bigger gpu.



It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.


----------



## john_ (Oct 6, 2020)

I don't think cache can replace bandwidth. Especially when games ask for more and more VRAM. I might be looking at it the wrong way and the next example could be wrong, but, Hybrid HDDs NEVER performed as real SSDs.

I am keeping my expectations really low after reading about that 256bit data bus.


----------



## Valantar (Oct 6, 2020)

Regardless of the veracity of this, there is definitely something _weird _about the rumored specifications for these GPUs. 256-bit and 192-bit bus widths for a high-end GPU in 2020 with no new tricks to counteract this would be a significant bottleneck. And AMD _obviously_ knows this. They do, after all, design GPUs for a living. They have the resources to, say, make a 512-bit test chip + PCB and benchmark it with varying numbers of memory controllers enabled, identifying when and how bottlenecks appear. And while 512-bit buses aren't really commercially viable (huge, hot, expensive, and at that point HBM is a better alternative at likely the same price), 384-bit buses are. So if they've chosen to go 256-bit for their highest end GPU, there has to be _some_ reason for it.


----------



## nguyen (Oct 6, 2020)

robb said:


> You have to be a special kind of stupid to think their top card will only match the 2080ti considering the 2080ti is 50% faster than the 5700xt. It does not take a genius to realize that doubling the cores of the 5700xt, increasing IPC, and running higher clocks would result in a MUCH higher gain than 50%. FFS even the XBOX series X has a gpu as fast or faster than the 2080 super and the 6900xt will be a hell of a lot bigger gpu.



Let say 6900XT is 20-30% faster than 2080 Ti in "specific" rasterization workload that doesn't require massive bandwidth, but slower than 2080 Ti in Ray Trace workload, does it mean 6900XT is a faster GPU ?
"But you don't need Ray Tracing" is not an excuse for >500usd GPU. 
Before you say that there are other API alternative for Ray Tracing, not having dedicated RT cores will just hammer performance, just look at Crysis Remastered as an example (the game can leverage the RT cores)


----------



## Vya Domus (Oct 6, 2020)

Vayra86 said:


> Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.



I've noticed you are quite dead set on saying some pretty inflammatory and quite stupid things to be honest as of late. What's the matter ?

A 2080ti has 134% the performance of a 5700XT. The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC. Only a pretty avid fanboy of a certain color would think that such a GPU could only muster some 30% higher performance with all that. GPUs scale very well, you can expect it to be between 170-190% the performance of a 5700XT.



Vayra86 said:


> Bandwidth is bandwidth and cache is not new.



Caches aren't new, caches as big as the ones rumored are a new thing. I should also point out that bandwidth and the memory hierarchy is completely hidden away from the GPU cores, in other words, whether it's reading at 100GB/s from DRAM or at 1 TB/s from a cache, it doesn't care, it's just operating on some memory at an address as far as the GPU core is concerned.

Rendering is also an iterative process where you need to go over the same data many times a second, if you can keep for example megabytes of vertex data in some fast memory close to the cores that's a massive win.

GPUs hide very well memory bottlenecks by scheduling hundreds of threads, another thing you might have missed is that over time the ratio of GB/s from DRAM per GPU core has been getting lower and lower. And somehow performance keeps increasing, how the hell does that work if "bandwidth is bandwidth" ?

Clearly, there are ways of increasing the efficiency of these GPU such that they need less DRAM bandwidth to achieve the same performance, this is another one of those ways. By your logic, we must have had GPUs with tens of TB/s by now because otherwise the performances wouldn't have gone up.



JAB Creations said:


> There will actually be stock because unlike Nvidia they're not trying to artificially drive up prices.



They wont have much stock, most wafers are going to consoles.



JAB Creations said:


> It's not going to be a watt-sucking heat-producing beast.



While performance/watt must have increased massively, perhaps even over Ampere, the highest end card will still be north of 250W.


----------



## fynxer (Oct 6, 2020)

john_ said:


> I don't think cache can replace bandwidth. Especially when games ask for more and more VRAM. I might be looking at it the wrong way and the next example could be wrong, but, Hybrid HDDs NEVER performed as real SSDs.
> 
> I am keeping my expectations really low after reading about that 256bit data bus.



Why do you think we have cache in CPU, GPU and SSD + more.

Because it works and it does replace bandwidth, information that the GPU uses repeatedly is stored in and fetched from cache and thus does not have to travel through the memory bus each time. Therefore the memory bandwidth saved by using cache can instead be used for other information. So a 256-bit bus with a large very effective cache equals MORE MEMORY BANDWITH, Nvidia already uses this system on all their cards.


----------



## zlobby (Oct 6, 2020)

Infinitiii Big Bang!


----------



## londiste (Oct 6, 2020)

Vya Domus said:


> A 2080ti has 134% the performance of a 5700XT.


At 1080p. At 1440p, its 142% and at 2160p its 152%.
More notably though, 3080 is twice as fast.


----------



## Vya Domus (Oct 6, 2020)

londiste said:


> At 1080p. At 1440p, its 142% and at 2160p its 152%.



Probably you're right, I went of the comparison tool thingy when you browse different GPU that one says the 2080ti is 134% the performance of a 5700XT.
_
Based on TPU review data: "Performance Summary" at 1920x1080, 4K for 2080 Ti and faster. _


----------



## ZoneDymo (Oct 6, 2020)

It Always pains me to see people overhyping products, it can pretty much only lead to dissapointment.
That said, lets not forget this GPU was pretty much made with the help of Sony and Microsoft because of their consoles using RDNA2, that is a lot of (smart) people working on a product, so I do have faith that it will be good.

And personally I care little for "beating" Nvidia in "performance".
If it delivers good frames, while going ez on the powerconsumption and while costing, finally again, a reasonable amount of money and not the obscene prices being asked as of late, its a winner in my book.

Heck I would REALLY love it if we had a new RX460/470/480 moment, where all games could be lifted up, where everyone could upgrade and get with the times.

This would also be really good for the evolution/implementation of Ray Tracing, the industry can only really make use of that if the world can use it.


----------



## Sithaer (Oct 6, 2020)

ZoneDymo said:


> And personally I care little for "beating" Nvidia in "performance".
> If it delivers good frames, while going ez on the powerconsumption and while costing, finally again, a reasonable amount of money and not the obscene prices being asked as of late, its a winner in my book.
> 
> Heck I would REALLY love it if we had a new RX460/470/480 moment, where all games could be lifted up, where everyone could upgrade and get with the times.
> ...



Yup, this is what I would also love to see and what I mainly care about when upgrading.

Those RX cards were a godsend for me, it was a solid upgrade from my previous card w/o breaking the bank/my wallet.

Looking at the prices lately, most likely my only option will be the second hand market again if I want the same performance uplift as last time. _'went from a GTX 950 to RX 570'_


----------



## M2B (Oct 6, 2020)

For the sake of comparison, the RTX 2080Ti has exactly twice as many shaders as the RTX 2060 Super with very similar Real-World clocks and performs about 63.3% better at 4K according to TPU's average framerate in 20+ games.
Based on Xbox Series X performance scaling over the X1X it doesn't seem like RDNA2 has much in the way of IPC improvements over RDNA.
So with similar clocks I expect the top-end 80CU RDNA2 to be 55-65% faster than the 5700XT depending on the resolution. (Assuming there is no bandwidth bottleneck)
But as we all know RNDA2 will have noticeably higher clocks than RDNA1, I expect the average clocks of the 80CU part to be in the 2-2.1GHz range which is a decent 10-13% above the 5700XT, assuming semi-linear scaling, this clock boost alone will put RDNA2 10-12% above RDNA1, now with addition of that massive shader count increase It's probably reasonable to expect the top-end RDNA2 to be 75-85% faster than the 5700XT as Vya Domus predicted.

Expecting flagship RDNA2 to be only as fast as a 3070/2080Ti is not realistic, as it will probably beat them both comfortably.


----------



## R0H1T (Oct 6, 2020)

When did X1X have RDNA based GPU 

Also, don't extrapolate RDNA2 performance based on console numbers. They're not exactly comparable, it's more like comparing cashews to figs.


----------



## M2B (Oct 6, 2020)

R0H1T said:


> When did X1X have RDNA based GPU



Nobody said it had.


----------



## R0H1T (Oct 6, 2020)

M2B said:


> Based on* Xbox Series X performance scaling over the X1X* it doesn't seem like *RDNA2 has much in the way of IPC improvements over RDNA*.


You said this, how can it be interpreted any differently?


----------



## M2B (Oct 6, 2020)

R0H1T said:


> You said this, how can it be interpreted any differently?



I agree, that part of my comment was a bit confusing but I didn't mean The X1X has RDNA.
just the Real-World performance increase didn't suggest higher IPC than RDNA1 to me, based on how RDNA performs in comparison to the console.


----------



## Calmmo (Oct 6, 2020)

You say Infinity Cache, i hear "we have chilplets on GPU's now"


----------



## laszlo (Oct 6, 2020)

me love to read comments!


----------



## delshay (Oct 6, 2020)

So no Nano card this time around as you need HBM for that.


----------



## bug (Oct 6, 2020)

Ok, who the hell calls Navi2 "Big Navi"?
Big Navi was a pipe dream of AMD loyalists left wanting for a first gen Navi high-end card.


----------



## Valantar (Oct 6, 2020)

M2B said:


> I agree, that part of my comment was a bit confusing but I didn't mean The X1X has RDNA.
> just the Real-World performance increase didn't suggest higher IPC than RDNA1 to me, based on how RDNA performs in comparison to the console.


That comparison is nonetheless deeply flawed. You're comparing a GCN-based console (with a crap Jaguar CPU) to a PC with an RDNA-based GPU (unknown CPU, assuming it's not Jaguar-based though) and then that again (?) to a yet to be released console with an RDNA 2 GPU and Zen2 CPU. As there are no XSX titles out yet, the only performance data we have for the latter is while running in backwards compatibility mode, which bypasses most of the architectural improvements even in RDNA 1 and delivers IPC on par with GCN. The increased CPU performance also helps many CPU-limited XOX games perform better on the XSX. In other words, you're not even comparing apples to oranges, you're comparing an apple to an orange to a genetically modified pear that tastes like an apple but only exists in a secret laboratory.

Not to mention the issues with cross-platform benchmarking due to most console titles being very locked down in terms of settings etc. Digital Foundry does an excellent job of this, but their recent XSX back compat video went to great lengths to document how and why their comparisons were problematic.


----------



## Vayra86 (Oct 6, 2020)

Vya Domus said:


> I've noticed you are quite dead set on saying some pretty inflammatory and quite stupid things to be honest as of late. What's the matter ?
> 
> A 2080ti has 134% the performance of a 5700XT. The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC. Only a pretty avid fanboy of a certain color would think that such a GPU could only muster some 30% higher performance with all that. GPUs scale very well, you can expect it to be between 170-190% the performance of a 5700XT.
> 
> ...



Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.

RT, where is it.

As for inflammatory... stupid.... time will tell won't it  Many times todays' flame in many beholders' eyes is tomorrows reality. Overhyping AMD's next best thing is not new and it never EVER paid off.


----------



## M2B (Oct 6, 2020)

Valantar said:


> That comparison is nonetheless deeply flawed. You're comparing a GCN-based console (with a crap Jaguar CPU) to a PC with an RDNA-based GPU (unknown CPU, assuming it's not Jaguar-based though) and then that again (?) to a yet to be released console with an RDNA 2 GPU and Zen2 CPU. As there are no XSX titles out yet, the only performance data we have for the latter is while running in backwards compatibility mode, which bypasses most of the architectural improvements even in RDNA 1 and delivers IPC on par with GCN. The increased CPU performance also helps many CPU-limited XOX games perform better on the XSX. In other words, you're not even comparing apples to oranges, you're comparing an apple to an orange to a genetically modified pear that tastes like an apple but only exists in a secret laboratory.
> 
> Not to mention the issues with cross-platform benchmarking due to most console titles being very locked down in terms of settings etc. Digital Foundry does an excellent job of this, but their recent XSX back compat video went to great lengths to document how and why their comparisons were problematic.



Most of what you said makes sense but it's not THAT unrealistic to compare these things.
I'm sure you've watched DF's 5700XT vs X1X video, right?









We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.

Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.

What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhancements.

By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) aren't utilized.
If the series x was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.


----------



## sergionography (Oct 6, 2020)

Frick said:


> It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.



It only burned people who for some reason think AMD need to have the fastest single GPU card on the market to compete. Reality is, most people will buy GPUs that cost less than 500. If I was AMD right now I'd take advantage of nvidia's desperate attempts to have that artificial fastest card in the market branding. I'd clock rdna2 in a way to maximize power efficiency and trash Nvidia for being a power hog. Ampere is worse than Fermi when it comes to being a power hog.


----------



## bug (Oct 6, 2020)

Vya Domus said:


> ... The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC...



Got a source for that?
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.


----------



## Valantar (Oct 6, 2020)

M2B said:


> Most of what you said makes sense but it's not THAT unrealistic to compare these things.
> I'm sure you've watched DF's 5700XT vs X1X video, right?
> 
> 
> ...


A big part of the reason the XSX dramatically outperforms the XOX is the CPU performance improvement. You seem to be ignoring that completely.

As for the back-compat mode working as if it was GCN: AMD literally presented this when they presented RDNA1. It is by no means a console exclusive feature, it is simply down to how the GPU handles instructions. It's likely not _entirely _1:1 as some low-level changes might carry over, but what AMD presented was essentially a mode where the GPU operates as if it was a GCN GPU. There's no reason to expect RDNA2 in consoles to behave differently. DF's review underscores this:


			
				Digital Foundry said:
			
		

> There may be the some consternation that Series X back-compat isn't a cure-all to all performance issues on all games, but again, this is the GPU running in compatibility mode, where it emulates the behaviour of the last generation Xbox - you aren't seeing the architectural improvements to performance from RDNA 2, which Microsoft says is 25 per cent to the better, teraflop to teraflop.


That is about as explicit as you get it: compatibility mode essentially nullifies the IPC (or "performance per TFlop") improvements of RDNA compared to GCN. That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.


----------



## BoboOOZ (Oct 6, 2020)

Vayra86 said:


> Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.


We have no idea fo that, really. I'm still half expecting to find out that there is HBM or that the bus width is in fact 384 bit.

In any case, one thing I am pretty sure AMD will not do: pair a 526 sq mm RDNA2 die with a memory bandwidth starved configuration similar to that of the 5700XT, that would definitely be stupid, even based on the average TPU forumite level.



bug said:


> Got a source for that?
> All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.


Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Before you dismiss Coreteks' speculations, yes, I agree his speculations are more miss than hit, but this video is leak, not speculation.


----------



## londiste (Oct 6, 2020)

Vayra86 said:


> Cache replaces bandwidth yes.


Honest question - does it? Cache obviously helps with most compute uses, but how bandwidth-limited are for example textures in gaming? IIRC textures are excluded from caches on GPUs (for obvious reasons).


----------



## M2B (Oct 6, 2020)

Valantar said:


> That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.



Isn't that 25% number the exact same IPC improvment AMD stated for the RDNA1 over GCN? If so, doesn't it make my point as RDNA2 not being that much of an improvment over RDNA in terms of IPC valid?

Anyways. The new cards will be out soon enough and we'll have a better idea of how much of an improvement RDNA2 brings in terms of IPC. It will be most obvious when comparing the rumored 40CU Navi22 to the 5700XT at the same clocks.


----------



## Dazzm8 (Oct 6, 2020)

RedGamingTech were the first to bring this up btw, not VideoCardz.


----------



## bug (Oct 6, 2020)

BoboOOZ said:


> Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.


Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count _and _ add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.


----------



## laszlo (Oct 6, 2020)

i didn't want to jump in sooner as i had to digest a lot of infos...my 2c... a large cache can improve drastically the communication between gpu/ram even if bandwidth  is used 100% ; it all depend how is used and what is processed in the end; if gpu can digest all without a bottleneck all is ok and we may see a higher performance with a new type of interconnection.


----------



## M2B (Oct 6, 2020)

This topic is probably beyond the understanding of us enthusiasts but I think extra cache can help to reduce the memory bandwidth requirements. It'll probably be application dependent and not as effective at higher resloutions where the sheer throughput might matter more but we've already seen higher clocked GPUs needing less bandwidth than an equally powerful GPU with lower clocks and more cores.
As higher clocks directly increases the bandwidth of the caches.


----------



## BoboOOZ (Oct 6, 2020)

bug said:


> Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count _and _ add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.


Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).

But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.

IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.


----------



## bug (Oct 6, 2020)

BoboOOZ said:


> Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).
> 
> But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.
> 
> IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.


Yeah, I wasn't stating any of that as fact. Just that the initial claims seem optimistic given the little we know so far.


----------



## Aquinus (Oct 6, 2020)

Vayra86 said:


> Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.


Cache alone does not replace bandwidth as you do still have to read from system memory. More cache does mean the number of hits goes up because more data is likely going to be available, but larger caches also usually means that latency goes up as well, so it's a balancing act. This is why the memory hierarchy is a thing and why cache levels are a thing, otherwise they'd just make an absolutely huge L1 cache for everything, but it doesn't work that way. So just saying "cache replaces bandwidth," is inaccurate. It augments memory bandwidth, but a system with very fast or a large amount of cache can still easily be crippled by slow memory. Just saying.


----------



## R0H1T (Oct 6, 2020)

It's actually exactly that, you don't usally see major changes in cache structure or indeed cache sizes unless you've exhausted other avenues of increasing IPC. A fast cache hobbled by slow memory or bad cache structure will decrease IPC, that's what happened with *Dozers *IIRC*. It had a poor memory controller & really slow L1/L2 write speeds, again *IIRC*. That wasn't the only drawback vs Phenoms but one of the major ones.


----------



## bug (Oct 6, 2020)

Not to mention all caches (big or small) can be thwarted by memory access patterns


----------



## hardcore_gamer (Oct 6, 2020)

Vayra86 said:


> Good comedy, this
> 
> Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.
> 
> ...



Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.


----------



## Punkenjoy (Oct 6, 2020)

What i like about this news is more about how cache work than how large the cache are. 

The thing with cache is more is not always better. You can increase latency with larger cache and sometime doubling the cache do not means a significant gain in cache hit. That would end in just wasted silicon.

So the fact to me that they are implementing a new way to handle the L1 cache is to me much more promising than if they just doubled the L2 or something like that. 

Note that big gain in performance will come from better cache and memory subsystem. We are starting to hit a wall there and getting data from fast memory just cost more and more power. If you can have your data to travel less, you save a lot of energy. Doing the actual computations doesn't require that much power, it's really moving the data around that increase the power consumption. So if you want an efficient architecture, you need to try to have your data to travel as less distance as possible.

But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again. 

If that doesn't happen, the good news is making a 256 bit architecture with a 250w tdp card cost much less than making a 350w tdp with larger bus card. AMD if they can't compete on pure performance, will be able to be very competitive on the pricing.

and in the end, that is what matter. I dont care if people buying 3090 spend too much, the card is just there for that. But i will be very happy if the next gen AMD cards increase the performance/cost in the 250-500$ range.


----------



## bug (Oct 6, 2020)

hardcore_gamer said:


> Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.


Well, as an engineer myself, I can tell you my job is 100% about balancing compromises. When I pick a solution, it's not the fastest and (usually) not the cheapest. And it's almost never what I would like to pick. It's what meets the requirements and can be implemented within a given budget and time frame.

Historically, any video card having a memory bus wider than 256 bits has been expensive (not talking HBM here), that is what made 256 bits standard for so many generations. 320 bits requires too complicated a PCB and even more so 384 or 512 bits.


----------



## Jism (Oct 6, 2020)

john_ said:


> I don't think cache can replace bandwidth. Especially when games ask for more and more VRAM. I might be looking at it the wrong way and the next example could be wrong, but, Hybrid HDDs NEVER performed as real SSDs.
> 
> I am keeping my expectations really low after reading about that 256bit data bus.



Well, going hybrid has a few key advantages. The data thats accessed frequently will be delivered much faster and data thats not frequently accessed or at least needs to be taken from the memory obviously has a small performance penalty. Second; using a cache like that you can actually save on memory bus and thus lowering power requirement for running a 312 / 512bit bus wide. But considering both consoles like the PS5 and Xbox carry the Navi hardware, it might be possible that devs finally know how to proper extract the performance in AMD really is.

Even if it's GDDR6, with a small bus, big gains could be gained when going low latency GDDR6. If i call correct applying the Ubermix 3.1 timings onto a polaris (which is basicly a 1666Mhz strap / timings applied onto 2000Mhz memory) yielded better results then simply overclocking the memory.

It's all speculation; what matters is the card being at 3080 territory or above, and AMD has a winner. Simple as that.


----------



## mechtech (Oct 6, 2020)

"Highly Hyped" ??  I must be living under a rock, I haven't seen much news on it.  I recall seeing more stuff on Ampere over the past several months compared to RDNA 2.


----------



## gruffi (Oct 6, 2020)

Frick said:


> It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.


Why do people like to poke around in the past? That should never ever be a valid argument. Things can always change for the good or the bad. Or did you expect the Ampere launch to be such a mess? Just concentrate on the facts and do the math. Big Navi will have twice the CUs of Navi 10 (80 vs 40), higher IPC per CU (10-15% ?) and higher gaming clock speeds (1.75 vs >2 GHz). Even without perfect scaling it shouldn't be hard to see that Big Navi could be 80-100% faster than Navi 10. What about power consumption?  Navi 10 has a TDP of 225W, Big Navi is rumored to have up to 300W TDP. That's 33.33% more. With AMD's claimed 50% power efficiency improvement of RDNA 2 that means it can be twice as fast per watt. To sum it up, Big Navi has everything to be twice as fast as Navi 10. Or at least to be close to that, 1.8-1.9x. And some people still think it will be only 2080 Ti level. Which is ~40-50% faster than Navi 10.


----------



## BoboOOZ (Oct 6, 2020)

Punkenjoy said:


> But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.


There's raw performance and there's processing performance, and they're not the same thing. I don't know if anybody remembers the Kyro GPU, it was a while ago, basically going toe to toe with Nvidia and ATI with less than half the bandwidth by using HSR and tile-based rendering.



gruffi said:


> Why do people like to poke around in the past?



History is good science, the problem with most TPU users is that they only go back 2 generations, which is not much of history if you ask me.


----------



## efikkan (Oct 6, 2020)

Guys, please, if you mean _performance per clock_ then say performance per clock. Don't use big words like "IPC" if you don't know what the technical term actually means. IPC is only relevant when comparing CPUs running the same ISA and workload for a single thread, while GPUs issues varying instructions across varying amounts of threads based on the GPU configuration, even within the same architecture.


----------



## BoboOOZ (Oct 6, 2020)

efikkan said:


> Guys, please, if you mean _performance per clock_ then say performance per clock. Don't use big words like "IPC" if you don't know what the technical term actually means. IPC is only relevant when comparing CPUs running the same ISA and workload for a single thread, while GPUs issues varying instructions across varying amounts of threads based on the GPU configuration, even within the same architecture.


I think it's safe to say that most people here talk about perf per clock, and this goes for the popular YouTubers, too.


----------



## Fluffmeister (Oct 6, 2020)

mechtech said:


> "Highly Hyped" ??  I must be living under a rock, I haven't seen much news on it.  I recall seeing more stuff on Ampere over the past several months compared to RDNA 2.



The hype for RDNA 2 is through the roof, which the lack of news helps. Nvidia are finally doomed and must play single fiddle for years to come.


----------



## TheoneandonlyMrK (Oct 6, 2020)

This is either the rumour that grew the biggest legs or close to the truth, I'm not sure, I'm not sure if I like it either, that big Navi GPU better not be half memory , but then at this point who knows.


----------



## Nkd (Oct 6, 2020)

Frick said:


> It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.



math doesn’t add up. Only way it adds up is if 6900xt is 60CU instead of 80CU at 2ghz. If it’s 80 CU it’s going to be competing with 3080 minimum. There are not ifs and buts about it. It’s just simple math.


----------



## M2B (Oct 6, 2020)

BoboOOZ said:


> I think it's safe to say that most people here talk about perf per clock, and this goes for the popular YouTubers, too.







Even AMD uses the term IPC for their GPUs, though everybody here probably knows that IPC is mostly a CPU terminology and we just it for the sake of simplicity.


----------



## ShurikN (Oct 6, 2020)

Either Big Navi is not high end (hence 256-bit bus), and was never meant to compete with GA102,
OR
it is high end and has some sort of hidden mumbo-jumbo, in this case Infinity Cache (aka very large cache) to offset the bandwidth.

Do you ppl really think AMD (it's engineers) went and made a 3080 competitor and then one day sat at a table and went "You know what this bad boy needs, a crippled memory bus. Let us go fuck this chip up so much that no one will ever buy it". And then everyone clapped and popped champagne bottles and ate caviar, confetti was flying, strippers came and everything.


----------



## Frick (Oct 6, 2020)

gruffi said:


> Why do people like to poke around in the past? That should never ever be a valid argument. Things can always change for the good or the bad. Or did you expect the Ampere launch to be such a mess? Just concentrate on the facts and do the math. Big Navi will have twice the CUs of Navi 10 (80 vs 40), higher IPC per CU (10-15% ?) and higher gaming clock speeds (1.75 vs >2 GHz). Even without perfect scaling it shouldn't be hard to see that Big Navi could be 80-100% faster than Navi 10. What about power consumption?  Navi 10 has a TDP of 225W, Big Navi is rumored to have up to 300W TDP. That's 33.33% more. With AMD's claimed 50% power efficiency improvement of RDNA 2 that means it can be twice as fast per watt. To sum it up, Big Navi has everything to be twice as fast as Navi 10. Or at least to be close to that, 1.8-1.9x. And some people still think it will be only 2080 Ti level. Which is ~40-50% faster than Navi 10.



It's just how people work. And if they expect it to be on 2080ti levels and it exceeds that they'll be pleasantly surprised, as opposed to dissapointed.


----------



## kingDR (Oct 6, 2020)

I personally don't think that the new Big Navy will be only 256-bit card, it's just impossible.


----------



## Jism (Oct 6, 2020)

It's possible. Perhaps AMD has developped a new memory compression technique to stamp more data through a smaller bus. Nvidia does it too. You can tell by the color difference (=image quality) both cards and drivers have to offer. Nvidia is basicly a bit more blurry compared to ATI/AMD, and that might explain why nvidia has a upperhand in some scenarios. The GPU has less pixels, particles to draw.


----------



## Valantar (Oct 6, 2020)

Jism said:


> It's possible. Perhaps AMD has developped a new memory compression technique to stamp more data through a smaller bus. Nvidia does it too. You can tell by the color difference (=image quality) both cards and drivers have to offer. Nvidia is basicly a bit more blurry compared to ATI/AMD, and that might explain why nvidia has a upperhand in some scenarios. The GPU has less pixels, particles to draw.


AMD has had memory compression for years, just like Nvidia. Nvidia's has historically been better, but the difference isn't major. Either way, this won't alleviate a tiny memory bus like this - it's for compressing color data after all, not for compressing texture assets and the like.


----------



## Bansaku (Oct 6, 2020)

Frick said:


> It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.



As an RX Vega 64 owner, I endorse this comment!


----------



## DeathtoGnomes (Oct 6, 2020)

So it seems AMD is playing the 2nd place _but affordable_ card once again. Why GDDR6 and not GDDR6X? What else could AMD have done to be a better match to a 3080?


----------



## Bansaku (Oct 6, 2020)

DeathtoGnomes said:


> So it seems AMD is playing the 2nd place _but affordable_ card once again. Why GDDR6 and not GDDR6X? What else could AMD have done to be a better match to a 3080?



Because GDDR6X is an NVIDIA exclusive. Google is your friend.


----------



## DeathtoGnomes (Oct 6, 2020)

Bansaku said:


> Because GDDR6X is an NVIDIA exclusive. Google is your friend.


you know where you can put your Google, right up your DuckDuckgo.


----------



## bug (Oct 6, 2020)

Bansaku said:


> Because GDDR6X is an NVIDIA exclusive. Google is your friend.


I don't think it's exclusive, as much as Nvidia offered to be the guinea pigs for GDDR6X and gobbled up all the available supply.
I'm also quite reluctant to put "affordable" next to a $500+ GPU.


----------



## Minus Infinity (Oct 6, 2020)

Frick said:


> It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.



No it's about simple math. Top tier Navi 21 will be 72-80CU, 2.1-2.2GHz, 20% IPC uplift, so easily can double 5700XT and that would smash the 2080Ti and be on par with 3080 while using 50W+ less.


----------



## SIGSEGV (Oct 7, 2020)

Fluffmeister said:


> The hype for RDNA 2 is through the roof, which the lack of news helps. Nvidia are finally doomed and must play single fiddle for years to come.



Meh, AMD didn't do anything to hype. It's you. Yes, all of you with your beyond overpowered analysis.



> Nvidia are finally doomed and must play single fiddle for years to come


I have really high hope that your statement will come true.


----------



## Caring1 (Oct 7, 2020)

M2B said:


> View attachment 171033
> Even AMD uses the term IPC for their GPUs, though everybody here probably knows that IPC is mostly a CPU terminology and we just it for the sake of simplicity.


Well that only proves AMD is confusing the issue by making up a similar phrase with the same abbreviation as IPC.
Which by the way is Instructions Per Clock, NOT Improved Performance per Clock.


----------



## InVasMani (Oct 7, 2020)

M2B said:


> View attachment 171033
> Even AMD uses the term IPC for their GPUs, though everybody here probably knows that IPC is mostly a CPU terminology and we just it for the sake of simplicity.





A lot of attention will be paid to the IPC/Clock Speed, but seems like logic enhancement situations right between the two and could pay nice dividends as a result hopefully.



kingDR said:


> I personally don't think that the new Big Navi will be only 256-bit card, it's just impossible.


 I was thinking 320-bit or 384-bit might make sense for the high end card, but if it has 16GB VRAM it certainly stands to reason might be 256-bit it would be more surprising if it wasn't. That said maybe that isn't Big Navi, but rather the mid range card that's 256-bit unless AMD confirmed otherwise.



DeathtoGnomes said:


> So it seems AMD is playing the 2nd place _but affordable_ card once again. Why GDDR6 and not GDDR6X? What else could AMD have done to be a better match to a 3080?


 Cost could be a contributing factor actually, but it does stand to reason that Nvidia might've gotten the bulk of the supply which was another factor. Either way if AMD seems to have come up with a cost effective work around in the form a infinity cache both AMD and consumers should in essence win. I see that being beneficial to both. I think AMD wanted to avoid another HBM cost price tag premium scenario. Still wouldn't be shocked if 256-bit isn't the premium Big Navi card, but rather the mid range or maybe it is and they might use GDDR6 for one and HBM2 on another with that same memory bus. Really with HBM2 it's got heaps of bandwidth in the first place so they can get away with a smaller bus. I'm not sure that adds up entirely, but if you factor in the infinity cache along with it I think it certainly could. Perhaps they just use GDDR6 on the lowest model w/o the cache and the two tiers above it both use the cache one uses GDDR6 while the other swaps it out for HBM2. I'm not sure what differences they'd do with the cores between them, but that's here nor there.


----------



## R0H1T (Oct 7, 2020)

In other "not so expected" news/rumor ~

__ https://twitter.com/i/web/status/1313613983027466240Nvidia did manage to surprise most with their pricing that's for sure, now we have to see if AMD manages to usurp some of these expectations with their *infinity cache*


----------



## nguyen (Oct 7, 2020)

So the initial leaks suggesting Navi21XT selling for 550usd make real sense now. The XTX flavor is probably reserved for Pro series cards.

Well at least AMD can earn higher margin on these Navi21 chips than the ones for XBX/PS5, so it's not all loss. AMD has already contracted to purchase 30 000 wafers of TSMC 7nm, they have to make use of them.


----------



## Vayra86 (Oct 7, 2020)

gruffi said:


> Why do people like to poke around in the past? That should never ever be a valid argument. Things can always change for the good or the bad. Or did you expect the Ampere launch to be such a mess? Just concentrate on the facts and do the math. Big Navi will have twice the CUs of Navi 10 (80 vs 40), higher IPC per CU (10-15% ?) and higher gaming clock speeds (1.75 vs >2 GHz). Even without perfect scaling it shouldn't be hard to see that Big Navi could be 80-100% faster than Navi 10. What about power consumption?  Navi 10 has a TDP of 225W, Big Navi is rumored to have up to 300W TDP. That's 33.33% more. With AMD's claimed 50% power efficiency improvement of RDNA 2 that means it can be twice as fast per watt. To sum it up, Big Navi has everything to be twice as fast as Navi 10. Or at least to be close to that, 1.8-1.9x. And some people still think it will be only 2080 Ti level. Which is ~40-50% faster than Navi 10.



225 > 300W

That is not 80-90% in any sliver of reality I know of. Not even with a minor shrink and IPC / efficiency bump. Because 50%... yeah. That is what they call https://en.wikipedia.org/wiki/Magical_thinking

The stars dó align with what we know of Navi so far and that is: +75~100W puts them at peak TDP budget. 256 bit GDDR6 severely limits their bandwidth cap to around 500GB/s, and even a 2080ti already has a good 20% more on tap. So EVEN if they have some magical cache design that provides breathing room... let's say they gain 20% and get an effective 600GB/s throughput or whatever performance equivalent that is for games. That'd be magical already.

So if they really did get 50% efficiency and really do get 300W TDP they have a grossly unbalanced GPU that will be memory starved half the time.

You have to be a really selective believer in rumors to get to your conclusion.


----------



## bug (Oct 7, 2020)

SIGSEGV said:


> Meh, AMD didn't do anything to hype. It's you. Yes, all of you with your beyond overpowered analysis.



Oh but they did. By throwing us bits and pieces, they practically created the hype.

I mean look at Ampere, for comparison: we had good hints about a new power connector, some insane amount of VRAM, doubled RTRT performance. Of course, no product leaked entirely, but we had enough to set most expectations.


----------



## InVasMani (Oct 7, 2020)

I think a lot of what Nvidia did with Ampere was expected you knew when the die shrink they'd have a bit of a home run opportunity on hand. I really am a bit doubtful AMD beats out Ampere at the high end, but I'd certainly welcome to competition if they did so. I do think they can win potentially at the mid range and lower end and make some progress there at being more competitive relative to Nvidia, but I'd say that still largely depends on AMD's ambition to do so at this stage. It does seem like there follow up to RNDA2 though could really strike hard especially with how well AMD's done financially lately from a company standpoint. 

I think it's expected that we'll see some really good increases in competition out of AMD the longer Intel struggles to regain it's foothold in a convincing way as opposed to hey I'm great at low resolution high refresh rate gaming and bad security. On the plus side for Intel at least they've become more convincing at multi-core performance it sure is nice to not have 4 quad cores representing the high end CPU market these days in fact it's pretty much now the low end outside of laptops for now and that won't last either. I think 8 core CPU's will be lower mid range minimum sooner rather than later the way things have been going, but we might see compromise in that region like big LITTLE which isn't too terrible it's tolerable for that end of the market, but not ideal as much at the other end.


----------



## SIGSEGV (Oct 7, 2020)

bug said:


> Oh but they did. By throwing us bits and pieces, they practically created the hype.



show me! give me your claim to support your beyond creative analysis about upcoming Radeon lineups and their performance. The source must be officially from AMD.





> I mean look at Ampere, for comparison: we had good hints about a new power connector, some insane amount of VRAM, doubled RTRT performance. Of course, no product leaked entirely, but we had enough to set most expectations.



Aw, come on dude, don't change the topic. I know the NVIDIA is impeccable for you.


----------



## bug (Oct 7, 2020)

SIGSEGV said:


> show me! give me your claim to support your beyond creative analysis about upcoming Radeon lineups and their performance. The source must be officially from AMD.
> 
> 
> 
> ...


No topic change. I was just giving you an objective comparison between the two launches.


----------



## medi01 (Oct 7, 2020)

It is a 1 year old patent, 5700 series might have it.



nguyen said:


> "But you don't need Ray Tracing" is not an excuse for >500usd GPU.


But  you "need" RT, because there is that handful of games (most of which either sponsored by green, or outright developed by them) doesn't fly too far.

Especially taking who reigns the console throne.


----------



## gruffi (Oct 7, 2020)

BoboOOZ said:


> History is good science, the problem with most TPU users is that they only go back 2 generations, which is not much of history if you ask me.


True.




kingDR said:


> I personally don't think that the new Big Navy will be only 256-bit card, it's just impossible.


Logic says, 256-bit isn't enough for such a card, especially at 4K. But there are also enough people who claim 10 GB VRAM is enough for 4K.  Needed bandwidth depends a lot on the cache system, data optimization and data compression. Let's see if AMD has done some "magic" with RDNA 2. If "Infinity Cache" is real and works as expected, why not 256-bit with GDDR6? It might still be a disadvantage in some cases at high resolutions. But it could give AMD an advantage with cost and power consumption that's worth it. Intel used an L4 cache on their Iris iGPUs before and it worked quite well.




DeathtoGnomes said:


> So it seems AMD is playing the 2nd place _but affordable_ card once again.


Such thoughts are for people who think 1st and 2nd place are determined by performance only. I think the one with better performance/watt and performance/mm² wins. Raw performance isn't everything. Or do you think Nvidia can always increase TDP by 100W just to keep the performance crown. 




Vayra86 said:


> 225 > 300W
> 
> That is not 80-90% in any sliver of reality I know of. Not even with a minor shrink and IPC / efficiency bump. Because 50%... yeah.


As I said before, simple math.
225W to 300W = 1.33x
1.33 x 1.5 = 2

So, yes. Given the higher TDP and increased power efficiency Big Navi could be twice as fast. If AMD really achieves 50% better power efficiency is another story. But I think they will be at least closer than what Nvidia promised. 1.9x on marketing slides, ~1.2x in reality.




R0H1T said:


> In other "not so expected" news/rumor ~
> 
> __ https://twitter.com/i/web/status/1313613983027466240


Rumors, rumors, rumors. There are also rumors about AMD giving false information to the AIBs and cards with locked BIOS. Nvidia gave false information to the AIBs too. For example, all AIBs had no clue about the real shader counts of Ampere before launch. The claim about AMD only targeting GA104 also makes no sense. AMD always said that Big Navi will target the high performance segment. Which definitely doesn't sound like a ...04 competitor.


----------



## BoboOOZ (Oct 7, 2020)

gruffi said:


> As I said before, simple math.
> 225W to 300W = 1.33x
> 1.33 x 1.5 = 2


Solid first posts, welcome to TPU.

Where did you say that before, btw?


----------



## nguyen (Oct 7, 2020)

gruffi said:


> Logic says, 256-bit isn't enough for such a card, especially at 4K. But there are also enough people who claim 10 GB VRAM is enough for 4K.  Needed bandwidth depends a lot on the cache system, data optimization and data compression. Let's see if AMD has done some "magic" with RDNA 2. If "Infinity Cache" is real and works as expected, why not 256-bit with GDDR6? It might still be a disadvantage in some cases at high resolutions. But it could give AMD an advantage with cost and power consumption that's worth it. Intel used an L4 cache on their Iris iGPUs before and it worked quite well.
> 
> Such thoughts are for people who think 1st and 2nd place are determined by performance only. I think the one with better performance/watt and performance/mm² wins. Raw performance isn't everything. Or do you think Nvidia can always increase TDP by 100W just to keep the performance crown.
> 
> ...



Here is a simpler math, Navi21 XT is 50% faster than 5700XT at the same power consumption just like AMD promised (so 50% perf/watt improvement). 5700XT is 220W TDP, same as 3070. Which make Navi21 XT compete directly with 3070. For the next month or so AMD will try to increase core clocks just so that Navi21 XT is a tad faster than 3070, using more power as a result.

Spec wise Navi21 and 3070 are just freakishly similar
256bit bus - 448GBps bandwidth
20-22 TFLOPS FP32
220-230W TGP

Around  2080 Ti performance or +15%. Overall I think Navi21 will be splitting distance to 3080 at 1080p/1440p gaming, but not so much at 4K or Ray Tracing.


----------



## Crustybeaver (Oct 7, 2020)

Got to laugh at all the deluded types thinking they're going to get 3090 perf for considerably less than the price of a 3080. AMD fans really are a special type of special.


----------



## BoboOOZ (Oct 7, 2020)

Crustybeaver said:


> Got to laugh at all the deluded types thinking they're going to get 3090 perf for considerably less than the price of a 3080. AMD fans really are a special type of special.


Where did you get that? Nobody knows anything about the pricing and that is decided at the last moment, or even after that sometimes.
This thread is about a supposedly innovative memory architecture that would allow AMD to compete in the high end with only a 256bit memory bus and GDDR6. This should allow AMD to build these cards cheaper. Whether they will sell them cheaper, and by how much, is a completely different story.


----------



## Valantar (Oct 7, 2020)

nguyen said:


> Here is a simpler math, Navi21 XT is 50% faster than 5700XT at the same power consumption just like AMD promised (so 50% perf/watt improvement). 5700XT is 220W TDP, same as 3070. Which make Navi21 XT compete directly with 3070. For the next month or so AMD will try to increase core clocks just so that Navi21 XT is a tad faster than 3070, using more power as a result.
> 
> Spec wise Navi21 and 3070 are just freakishly similar
> 256bit bus - 448GBps bandwidth
> ...


The problem with that is that GPU TFlops _cannot whatsoever_ be compared across GPU architectures for any other use than pure compute. Gaming performance/Tflop is _vastly_ different between architectures. This is doubly true with Ampere and its doubled (but only some times) FP32 count. An example: The 2080 performs at 66% of the 3080 at 1440p in TPU's test suite. The 3080 delivers 29.77 TFlops FP32 vs. 10.07 for the 2080. 100 / 29.77 = 3,36. 66 / 10.07 = 6,55. In other words, the 2080 delivers _twice_ the performance per teraflop of FP32 compute of the 3080. Similarly, when comparing to the 5700 XT, that calculation becomes 57 / 9.754 = 5.84, or 74% higher perf/tflop than the 3080. So _please_, for the love of all rational thinking, stop using FP32 Tflops as a way to estimate gaming performance across architectures. It is _only_ somewhat valid within the same architecture.


----------



## BoboOOZ (Oct 7, 2020)

Valantar said:


> The problem with that is that GPU TFlops _cannot whatsoever_ be compared across GPU architectures for any other use than pure compute. Gaming performance/Tflop is _vastly_ different between architectures. This is doubly true with Ampere and its doubled (but only some times) FP32 count. An example: The 2080 performs at 66% of the 3080 at 1440p in TPU's test suite. The 3080 delivers 29.77 TFlops FP32 vs. 10.07 for the 2080. 100 / 29.77 = 3,36. 66 / 10.07 = 6,55. In other words, the 2080 delivers _twice_ the performance per teraflop of FP32 compute of the 3080. Similarly, when comparing to the 5700 XT, that calculation becomes 57 / 9.754 = 5.84, or 74% higher perf/tflop than the 3080. So _please_, for the love of all rational thinking, stop using FP32 Tflops as a way to estimate gaming performance across architectures. It is _only_ somewhat valid within the same architecture.


True. Going by Nvidia promotional marketing (probably over-optimistic)  the 3070 is equivalent in performance with the 3080Ti, which means that 20 Ampere TFlops are roughly equivalent in gaming with 14 Turing TFlops.


----------



## londiste (Oct 7, 2020)

Valantar said:


> The problem with that is that GPU TFlops _cannot whatsoever_ be compared across GPU architectures for any other use than pure compute. Gaming performance/Tflop is _vastly_ different between architectures. This is doubly true with Ampere and its doubled (but only some times) FP32 count. An example: The 2080 performs at 66% of the 3080 at 1440p in TPU's test suite. The 3080 delivers 29.77 TFlops FP32 vs. 10.07 for the 2080. 100 / 29.77 = 3,36. 66 / 10.07 = 6,55. In other words, the 2080 delivers _twice_ the performance per teraflop of FP32 compute of the 3080. Similarly, when comparing to the 5700 XT, that calculation becomes 57 / 9.754 = 5.84, or 74% higher perf/tflop than the 3080. So _please_, for the love of all rational thinking, stop using FP32 Tflops as a way to estimate gaming performance across architectures. It is _only_ somewhat valid within the same architecture.


You would be better off comparing 2080Ti with 3080. Other than the 2xSP and faster VRAM they are very similar. Based on TPU reviews, even clock speeds are close enough.

TFLOPs can be compared across architectures as long as they are similar enough. Granted, that is never quite so easy. GCN had a problem with utilization, Turing has the FP32+INT thing, Ampere throws a huge wrench into trying to compare it with the 2xFP32 scheme it has.


----------



## InVasMani (Oct 7, 2020)

Here's the way I'm seeing it from a AMD angle perhaps they've taken pretty much everything they've gleaned from the I/O chiplet and CCX issues with Ryzen and applied it towards their GPU architecture and gone steps further. Latency can make a pronounced difference especially depending upon which part of the spectrum it lands. L1 latency is actually a ideal scenario that's the end of the spectrum that's more beneficial. There will probably certain GPU workloads where if this infinity cache is indeed part of RDNA2 that it does exceedingly well at them.


----------



## gruffi (Oct 8, 2020)

nguyen said:


> Spec wise Navi21 and 3070 are just freakishly similar
> 256bit bus - 448GBps bandwidth
> 20-22 TFLOPS FP32
> 220-230W TGP


Actually they are not. I heard rumors of the TDP between 250W and 300W. With the latter being more likely. I never heard rumors about 220W. Maybe reasonable for cut down Navi 21, but not full Navi 21. You also cannot directly compare TFLOPS. If that was the case then 3070 should be waayyy faster than 2080 Ti (~20.3 vs ~13.5 fp32 TFLOPS). But both are expected to have similar performance. Ampere's TFLOPS scale much worse than Turing's. The main reason for that is the changed shader architecture. Fp and integer execution units were unified. There can only be one retired operation per clock, fp or integer. Which means a lot of fp resources are unused most of the time because they are reserved for integer operations. RDNA's and Turing's TFLOPS are much more comparable. I think that won't change much with RDNA 2. Which means Navi 21's rumored ~20.5 fp32 TFLOPS might be more like ~30 fp32 TFLOPS of Ampere. Which is in fact about 3080's fp32 TFLOPS. The number of TMUs is also more comparable between Navi 21 and GA102. GA104 has a lot less TMUs.


----------



## sergionography (Oct 8, 2020)

Dazzm8 said:


> RedGamingTech were the first to bring this up btw, not VideoCardz.


Yeah they been saying this for months now.


----------



## Vayra86 (Oct 8, 2020)

gruffi said:


> True.
> 
> 
> 
> ...



Agreed on all points really, I'm just more of a pessimist when it comes to 'magic' because really... there's never been any. As for what AMD said for Big Navi... lol. They said that for everything they released on the top of the stack and since Fury X none of it really worked out well. Literally nothing. It was always too late, too little, and just hot and noisy. RDNA1 was just GCN with a new logo. This time it'll be different, really? They're just going to make the next iterative step towards 'recovery'. It won't be a major leap, if they knew how to, they'd have done it years ago.

But the last bit of your post hits home with me - AMD wasn't jebaited. They've had this all along and its all they'll write, it is completely plausible, as expected, and not Youtuber-madness territory as I've been saying all along ever since RDNA2 became a term. They're trailing 1,5 gen, still, and they always have and probably always will be since Polaris. The only improvement now is probably the time to market. Which is already a big thing - as you say, they don't NEED to fight the 3090. But they really did/should fight the 3080. That card is fast enough that most people can make do with 'a little less' but that really does relegate AMD to a competition over the 500 dollar price point, and not the 700 dollar one. Which means they're effectively busting the 3070 at best.

I'm also entirely with you when you say the best GPU isn't necessarily the fastest one. Yes. If AMD can pull off a more efficient, smaller die that performs 'just as well' but not in the top end, that is just fine. But that has yet to happen. They have a node advantage, but they're still not at featureset parity and already not as efficient as Nvidia's architecture. Now they need to add RT.

I'm not all that excited because all stars have already aligned long ago. You just gotta filter out the clutter, and those efficiency figures are among them. Reality shines through in the cold hard facts: limited to GDDR6, not as power efficient if they'd have to go for anything bigger than 256 bit, and certainly also not as small as they'd want, plus they've been aiming the development at a console performance target and we know where that is.

I reckon that Videocardz twitter blurb is pretty accurate.


----------



## BoboOOZ (Oct 8, 2020)

Vayra86 said:


> Which means they're effectively busting the 3070 at best.


I have no idea how you can come to the conclusion that AMD will only compete with the 3070 with a 536mm² die. It's their biggest die ever, with pretty much the same amount of transistors as the 3080, only to compete with the GA104? If that were true, I think Radeon should just forget desktop graphics altogether for the future, but I'm pretty certain it's not, you should listen to Tom and Paul more.


----------



## Vayra86 (Oct 8, 2020)

BoboOOZ said:


> I have no idea how you can come to the conclusion that AMD will only compete with the 3070 with a 536mm² die. It's their biggest die ever, with pretty much the same amount of transistors as the 3080, only to compete with the GA104? If that were true, I think Radeon should just forget desktop graphics altogether for the future, but I'm pretty certain it's not, you should listen to Tom and Paul more.



They'll end up between 3070 and 3080, but they won't fight the 3080 with that bandwidth. Just not happening.


----------



## BoboOOZ (Oct 8, 2020)

Vayra86 said:


> They'll end up between 3070 and 3080, but they won't fight the 3080 with that bandwidth. Just not happening.


I'm marking up this post, you already said you'll buy this card if you are proved wrong, right?


----------



## Vayra86 (Oct 8, 2020)

BoboOOZ said:


> I'm marking up this post, you already said you'll buy this card if you are proved wrong, right?



You can mark all of my posts, I've been saying this all along bud


----------



## Valantar (Oct 8, 2020)

Vayra86 said:


> Agreed on all points really, I'm just more of a pessimist when it comes to 'magic' because really... there's never been any. As for what AMD said for Big Navi... lol. They said that for everything they released on the top of the stack and since Fury X none of it really worked out well. Literally nothing. It was always too late, too little, and just hot and noisy. RDNA1 was just GCN with a new logo. This time it'll be different, really? They're just going to make the next iterative step towards 'recovery'. It won't be a major leap, if they knew how to, they'd have done it years ago.


Sorry, I get that being skeptical is good, but what kind of alternate reality have you been living in? While the Fury X definitely wasn't a massive improvement over previous GCN GPUs beyond just being bigger, it certanly wasn't noisy - except for some early units with whiny pumps it's still one of the quietest reference GPUs ever released. As for perf/W, it mostly kept pace with the 980 Ti above 1080p, but it did consume a bit more power doing so, and had no overclocking headroom - the latter being the only area where AMD explicitly overpromised anything for the Fury X. Overall it was nonetheless competitive. 

And as for RDNA1 being "just GCN with a new logo"? How, then does it manage to dramatically outperform all incarnations of GCN? Heck, the 7nm, 40CU, 256-bit GDDR6, 225W 5700X essentially matches the 7nm, 60CU, 4096-bit HBM2, 295W Radeon VII. At essentially the same clock speeds (~100MHz more stock vs. stock). So "essentially the same" to you means a 50% performance/CU uplift on a much narrower memory bus, at a lower power draw despite a more power hungry memory technology, on the same node? Now, obviously the VII is by no means a perfect GPU - far from it! - but equating that development to being "just GCN with a new logo" is downright ridiculous. RDNA1 is - from AMD's own presentation of it at launch, no less - a stepping stone between GCN and a fully new architecture, but with the most important changes included. And that is reflected in both performance and efficiency.

There have been _tons_ of situations where AMD have overpromised and underdelivered (Vega is probably the most egregious example), but the two you've picked out are arguably not that.



Vayra86 said:


> I'm not all that excited because all stars have already aligned long ago. You just gotta filter out the clutter, and those efficiency figures are among them. Reality shines through in the cold hard facts: limited to GDDR6, not as power efficient if they'd have to go for anything bigger than 256 bit, and certainly also not as small as they'd want, plus they've been aiming the development at a console performance target and we know where that is.


I'm not going to go into speculation on these specific rumors just because there's too much contradictory stuff flying around at the moment, and no(!) concrete leaks despite launch being a few weeks away. (The same applies for Zen 3 btw, which suggests that AMD is in full lockdown mode pre-launch.) But again, your arguments here don't stand up. How have they been "aiming the development at a console performance target"? Yes, RDNA 2 is obviously developed in close collaboration with both major console makers, but how does that translate to their biggest PC GPU being "aimed at a console performance target"? The Xbox Series X has 52 CUs. Are you actually arguing that AMD made that design, then went and said "You know what we'll do for the PC? We'll deliver the same performance with a 50% larger die and 33% more CUs! That makes sense!"? Because if that's what you're arguing, you've gone _way_ past reasonable skepticism.

You're also ignoring the indisputable fact that AMD's GPUs in recent years have been made on shoestring R&D budgets, with RDNA 1 being the first generation to even partially benefit from the Zen cash infusion. RDNA 2 is built almost entirely in the post-Zen period of "hey, we've suddenly got the money to pay for R&D!" at AMD. If you're arguing that having more R&D resources has _no_ effect on the outcome of said R&D, you're effectively arguing that _everyone_ at AMD is grossly incompetent. Which, again, is way past reasonable skepticism.

There are a few _plausible_ explanations here:
- AMD is going wide and slow with Navi 21, aiming for efficiency rather than absolute performance. (This strongly implies there will be a bigger die later, though there obviously is no guarantee for that.)
- AMD has figured out some sort of mitigation for the bandwidth issue (though the degree of efficiency of such a mitigation is obviously entirely up in the air, as that would be an entirely new thing.)
- The bandwidth leaks are purposefully misleading.

The _less plausible_ explanation that you're arguing:
- AMD has collectively lost its marbles and is making a huge, expensive die to compete with their own consoles despite those having much lower CU counts.
- Everyone at AMD is grossly incompetent, and can't make a high performance GPU no matter the resources.

If you ask me, there's reason to be wary of all the three first points, but _much more_ reason to disbelieve the latter two.


----------



## DeathtoGnomes (Oct 8, 2020)

gruffi said:


> Such thoughts are for people who think 1st and 2nd place are determined by performance only. I think the one with better performance/watt and performance/mm² wins. Raw performance isn't everything. Or do you think Nvidia can always increase TDP by 100W just to keep the performance crown.


I agree with you, however, I was pointing out a historical pattern by AMD, and yes Nvidia will do just about everything to keep its profits up.  

TBH, I really dont care Hu is on first.


----------



## Zach_01 (Oct 8, 2020)

Vayra86 said:


> RDNA1 was just GCN with a new logo. This time it'll be different, really?


Oh Really? From what part of you are you pulling this? If it’s your brain, it’s sad...


----------



## nguyen (Oct 8, 2020)

BoboOOZ said:


> I have no idea how you can come to the conclusion that AMD will only compete with the 3070 with a 536mm² die. It's their biggest die ever, with pretty much the same amount of transistors as the 3080, only to compete with the GA104? If that were true, I think Radeon should just forget desktop graphics altogether for the future, but I'm pretty certain it's not, you should listen to Tom and Paul more.



I have to remind you that
Vega 64 vs GTX 1080
Vega10 vs GP104
495mm2 vs 314mm2
484GBps vs 320GBps
300W vs 180W TDP

And Vega64 still lost to GTX 1080. Yeah Pascal kinda devastated AMD for the past 4 years. 1080 Ti (and also Titan XP) still has no worthy competition from AMD. Ampere is here so that massive amount of Pascal owners can upgrade to .


----------



## R0H1T (Oct 8, 2020)

Right, so you're basically saying Ampere is for the last 0.1% fps chasing public


----------



## Vayra86 (Oct 8, 2020)

Zach_01 said:


> Oh Really? From what part of you are you pulling this? If it’s your brain, it’s sad...



Its just another update to GCN, a good one, I won't deny that... but its no different from Maxwell > Pascal for example, and everyone agrees that is not a totally new arch either. They moved bits around, etc.

Unless you want to argue that this





Is radically different from this







R0H1T said:


> Right, so you're basically saying Ampere is for the last 0.1% fps chasing public



Kinda? Maybe they scaled the 'demand' expectations on that as well


----------



## Valantar (Oct 8, 2020)

Vayra86 said:


> Its just another update to GCN, a good one, I won't deny that... but its no different from Maxwell > Pascal for example, and everyone agrees that is not a totally new arch either. They moved bits around, etc.
> 
> Unless you want to argue that this
> View attachment 171199
> ...


I don't think "new architecture" necessarily has to mean "we invented new functional blocks" - if that's the requirement, there has barely been a new GPU architecture since the introduction of unified shaders...

If we're going by those block diagrams - ignoring the fact that block diagrams are themselves extremely simplified representations of something far more complex, and assuming that they accurately represent the silicon layout - we see quite a few changes. Starting from the right, the L1 cache is now an L0 Vector cache (which begs the question of what is now L1, and where it is), the local data share is moved next to the texturing units rather than between the SPs, SPs and Vector Registers are in groups twice as large, the scheduler is dramatically shrunk, split up and distributed closer to the banks of SPs, the number of scalar units and registers is doubled, there are two entirely new caches in between the banks of SPs, also seemingly shared between the two CUs in the new Work Group Processor unit, and lastly there's no longer a branch & message unit in the diagram at all.

Sure, these look superficially similar, but expecting a complete ground-up redesign is unrealistic (there are only so many ways to make a GPU compatible with modern APIs, after all), and there are quite drastic changes even to the block layout here, let alone the actual makeup of the different parts of the diagram. These look the same only if you look from a distance and squint. Similar? Sure. But definitely not the same. I would think the change from Kepler to Maxwell is a much more fitting comparison than Maxwell to Pascal.



nguyen said:


> I have to remind you that
> Vega 64 vs GTX 1080
> Vega10 vs GP104
> 495mm2 vs 314mm2
> ...


That's true. But then you have
RX 5700 XT vs RTX 2070
Navi 10 vs TU106
251mm² vs 445mm²
448GBps vs. 448GBps
225W vs. 175W TDP

Of course this generation AMD has a node advantage, and the 5700 XT still loses out significantly in terms of efficiency in this comparison (though not at all if looking at versions of the same chip clocked more conservatively, like the 5600 XT, which beats every single RTX 20xx GPU in perf/W).

Ampere represents a significant density improvement for Nvidia, but it's nowhere near bringing them back to the advantage they had with Pascal vs. Vega.


----------



## Zach_01 (Oct 8, 2020)

Vayra86 said:


> Its just another update to GCN, a good one, I won't deny that... but its no different from Maxwell > Pascal for example, and everyone agrees that is not a totally new arch either. They moved bits around, etc.
> 
> Unless you want to argue that this
> View attachment 171199
> ...


So basically a better than Turing to Ampere situation... with a Jensen x2 perf uplift, that is in reality x1.2

Ampere is looking good only because Turing was so bad, over Pascal. Perf and price wise.


----------



## nguyen (Oct 8, 2020)

Valantar said:


> That's true. But then you have
> RX 5700 XT vs RTX 2070
> Navi 10 vs TU106
> 251mm² vs 445mm²
> ...



5700XT doesn't have any RT/Tensor cores, that make comparison between 5700XT to 2070 a bit moot. 2070 is like a car with Turbo Charger that people just disable it because that would make it unfair to other non Turbo Charged car.
Here is a fair comparison: Crysis Remastered with vendor agnostic RT that can leverage RT cores on Turing









2070 Super is like 3x the performance of 5700XT when there is alot of RT effects there (2070 and 2070 Super is15% apart).

Right now against Ampere, the node advantage that Big Navi has is so tiny that it's not strange that Navi21 XT 530mm2 is competing against GA104 394mm2. Also Navi21 XT is a cut down version much like 3080. The full fat Navi21 XTX version will be reversed for the Pro version where AMD has better margin selling them.


----------



## Zach_01 (Oct 8, 2020)

nguyen said:


> Right now against Ampere, the node advantage that Big Navi has is so tiny that it's not strange that* Navi21 XT 530mm2 is competing against GA104 394mm2*.


Where is this info? Because I can say that Navi 21 536mm2 is competing against GA102 628mm2


----------



## Vayra86 (Oct 8, 2020)

Zach_01 said:


> So basically a better than Turing to Ampere situation... with a Jensen x2 perf uplift, that is in reality x1.2
> 
> Ampere is looking good only because Turing was so bad, over Pascal. Perf and price wise.



I agree on that completely actually  but I don't think that was the topic, was it?

The problem is however, that AMD has yet to even reach Turing's peak performance level, and not just by a few % either. You can afford 'a Turing' when you're ahead of the game, otherwise it just sets you back further. Let's be real about it: RDNA2 as it is dreamt to be, should've been here 1,5 year ago at the latest. The fact they're launching it now though is still good progress... like I said earlier. Time to market seems to have improved. If they can also get closer on absolute performance, I'll be cheering just as much as you.

The problem with Navi so far is that I have absolutely no reason for cautious optimism. AMD has been silent about it other than some vague percentages that really say as much as Ampere's very generously communicated 1.9x performance boost. As much as that number is far from credible.... why would this one suddenly be the truth? These claims have and will always be heavily inflated and best-case. Other than that, we do know AMD has severe limitations to work with, most notably on memory. Anyway... this has all been said before, but that's where I'm coming from here. Not an anti-AMD crusade... just realism and history of progress. I really want them to catch up, but I don't feel like the stars have aligned yet.


----------



## nguyen (Oct 8, 2020)

Zach_01 said:


> Where is this info? Because I can say that Navi 21 536mm2 is competing against GA102 628mm2



Yeah sure, maybe in AOTS benchmark.


----------



## Vayra86 (Oct 8, 2020)

nguyen said:


> 5700XT doesn't have any RT/Tensor cores, that make comparison between 5700XT to 2070 a bit moot. 2070 is like a car with Turbo Charger that people just disable it because that would make it unfair to other non Turbo Charged car.
> Here is a fair comparison: Crysis Remastered with vendor agnostic RT that can leverage RT cores on Turing
> 
> 
> ...



Please remove that ugly fart of a benchmark video because Crysis Remastered runs like shit regardless of GPU. Its not doing you - or anyone else - any favors to use as comparison.

You're almost literally looking at a PS3 engine here. Single threaded.


----------



## Zach_01 (Oct 8, 2020)

nguyen said:


> Yeah sure, maybe in AOTS benchmark.


You still do not answer anything. Where exactly are basing the poor asumption that RDNA2 NAVI will compete only with GA104. Based on the 256bit bus?



Vayra86 said:


> I agree on that completely actually  but I don't think that was the topic, was it?
> 
> The problem is however, that AMD has yet to even reach Turing's peak performance level, and not just by a few % either. You can afford 'a Turing' when you're ahead of the game, otherwise it just sets you back further. Let's be real about it: RDNA2 as it is dreamt to be, should've been here 1,5 year ago at the latest. The fact they're launching it now though is still good progress... like I said earlier. Time to market seems to have improved. If they can also get closer on absolute performance, I'll be cheering just as much as you.
> 
> The problem with Navi so far is that I have absolutely no reason for cautious optimism. AMD has been silent about it other than some vague percentages that really say as much as Ampere's very generously communicated 1.9x performance boost. As much as that number is far from credible.... why would this one suddenly be the truth? These claims have and will always be heavily inflated and best-case. Other than that, we do know AMD has severe limitations to work with, most notably on memory. Anyway... this has all been said before, but that's where I'm coming from here. Not an anti-AMD crusade... just realism and history of progress. I really want them to catch up, but I don't feel like the stars have aligned yet.


Its the too much pessimist thoughts and the presented doomed future of AMD graphics devision, that is making me be part of this discussion.
My thoughts are different. I can almost see a repeat in "history" but not the one that negative-to-RDNA people see.
This could be a new ZEN case with RDNA1 to be the ZEN and RDNA2 to be ZEN2/3

I guess we will see in 20 days


----------



## InVasMani (Oct 8, 2020)

Here's some of personal takes and thoughts surrounding RDNA2 from back on Wednesday, June 12th 2019.

*"The increased R&D budget should help bolster AMD's graphics division for what comes after NAVI. The transition down to 7nm or 7nm+ for Nvidia will be a nice jump in performance though at the same time for them. What AMD has planned for what follows NAVI is somewhat critical. They can't let their foot off the gas and need to accelerate their plans a bit and be more aggressive.

AMD should probably aim for*

*3X more more instruction rate's over NAVI for it's successor*
*3X to 4X further lossless compression*
*increase ROP's from 64 to 80*
*improve the texture filter units by 0.5X*
*improve texture mapping units by 0.5X to 1.5X (allowing for a better ratio of TFU's to TMU's)*
*3 CU resource pooling*
*7nm+ or node shrink*
*more GDDR capacity hopefully I think by the time a successor arrives we could see more per chip GDDR6 capacity or a price reduction*
*higher clocked GDDR*
*Bottom line I think AMD should really try to be more aggressive and further optimize it's efficiency of it's design and hopefully bump up frequency as well a bit. I don't think they need more stream processors right now, but rather need to improve the overall efficiency as a whole further to get more out of them. They also should aim to do some things to offer a few more GPU sku's to consumers at different price targets. I tend to think if they do that as well they might be able to even cut down chips to offer some good 2X or even 3X dual/triple GPU's as well based on PCIE 4.0 which good be good. I think if they could make the ROPs scale from 44/64/80 it would work well for just that type of thing and allowing better yields and binning options for AMD to offer to consumers.

Those are my optimistic aggresive expectations of what AMD should try to aim towards for NAVI's successor if the R&D budget allows for it at least. They should really make some attempt to leap frog ahead a bit further especially as Nvidia will be shrinking down to a lower node for whatever comes after Turing or "SUPER" anyway since that sounds like more of a simply refresh and rebadge with a new bigger high end Super Titan sku added because what else would they name it instead 2080ti Super why!?!?"

Nvidia's GPU's are in general more granular in terms of workload management and thus power and efficiency. AMD needs to step it up more and it's not that AMD GPU's can't be efficient, but in order for a GPU like Vega 56/64 to compete with Nvidia's higher end and more diverse offers they have to stray more outside of power and efficiency so end up looking less efficient and more power hungry than they could be under more ideal circumstances with a better budget to design more complex and granular GPU's as Nvidia offers. It boils down to price segments and where they are marketed by both companies, but it's a more uphill battle for AMD given the R&D budget. The transition to 7nm was a smart call for AMD at least since it'll get cheaper over time along with yields and binning improvements. It should make for a easier transition to 7nm+ as well. Finer power gating would probably help out AMD a fair amount as well at improving TDP for load and idle and will become more important anyway at lower node sizes to reduce voltages and waste heat plus it's important for mobile which is a area for big room for growth for the company."*



It was mostly just goal posts to aim toward, but should be intriguing to see where AMD went with the design obviously they had a bit of a basis with RDNA and by extension GNC and everything before them. Looking at it today 64/80 ROPs card seem plausible while 44ROPs is highly doubtful, but it was even then even with a mGPU scenario it would be rather unlikely still who knows maybe they surprise us. Though if they were doing a mGPU solution I'd think they incorporate the infinity fabric bridge that Radeon Pro workstation card came up with as well as the infinity cache that's rumored the combination would probably do a lot to reduce and minimize the latency micro-stutter matter. 

Looking at it now I tend to think a 72 ROPs and possibly 88 ROPs card is plausible to consider. If they can carve out a 64/72/80 ROPs segmentation that would probably be decent having 4 cards in total. Given the amount of SKU's Nvidia often offers that could be good for AMD. The interesting part about a 88ROPs sku is they could bin those chips and saving them aside and hold out to utilize them with GGDR6X a little further down the road when pricing for that normalizes and/or capacity increases. If they do it that way with the ROPs I could see a 224-bit/256-bit/288-bit/320-bit memory bus options plausible with infinity cache balancing them out further. 

To me it looks like they are sticking to a single chip solution so probably bolstered the ROPs a fair bit along with some of those other area's mentioned. I think the highest end SKU will end up with either 80 or 88 ROPs it might initially be 80ROPs with some premium 88ROPs SKU's being binned and tucked away for a rainy day perhaps though tough to say. What rabbits will AMD pull out of it's hat of mysteries!!? Who knows though certainly fun to speculate though. I do hope they took some of those bullet points into consideration. 

Improving the compression would be big deal I think they've fallen behind in that area relative to Nvidia. Some of the other stuff I felt with offer some synchronizing benefits along with improved design performance and/or efficiency enhancements. I think I looked at VEGA and RNDA block design diagrams to get a basic idea and figure out just how they might take things steps further for RDNA2 based on the changes made between VEGA to RDNA and some of my own personal injections. To me it was quite obvious they were trailing Nvidia and needed to hopefully make a big push on RDNA2 given Nvidia is already going to have a nice performance aid from the chip die shrink.

I feel like over engineering RNDA2 is the only practical way AMD can claw it's way back ahead of Nvidia especially this GPU refresh round since they were already at 7nm though idk if RNDA2 will be 7nm+ or not which would help and be welcome naturally. Nvidia will likely have higher priority on GDDR6X as well for sometime. In a sense and with AMD probably knowing Nvidia probably end up with a higher priority on that newer memory type that lends some credibility to the possibility of a infinity cache to offset it especially if they combine it with a slightly wider memory bus width. To me a big key is how well they segment the different GPU SKU's. On the memory side there are different scenario's at play do they have some SKU's with more ROP's and a wider memory bus with GDDR6X or HBM2!? Is the infinity cache scaled depending on SKU and how big is it and is it availible for all SKU's? Lots of possibilities and what are they doing with compression I sure hope they are making inroads to improve that.


----------



## Vayra86 (Oct 8, 2020)

Zach_01 said:


> You still do not answer anything. Where exactly are basing the poor asumption that RDNA2 NAVI will compete only with GA104. Based on the 256bit bus?
> 
> 
> Its the too much pessimist thoughts and the presented doomed future of AMD graphics devision, that is making me be part of this discussion.
> ...



We will indeed and yes... AMD's predicted doom is like the predicted demise of PC gaming. It never happens


----------



## nguyen (Oct 8, 2020)

Zach_01 said:


> You still do not answer anything. Where exactly are basing the poor asumption that RDNA2 NAVI will compete only with GA104. Based on the 256bit bus?



For the past decade AMD has never once produced a card that has less memory bandwidth but out-perform Nvidia's counterpart
HD 7970 vs GTX 680 (264GBps vs 192GBps)
R9 290X vs GTX 780 Ti (320GBps vs 336GBps)
Fury X vs 980 Ti (512GBps vs 336 GBps)
Vega64 vs 1080 Ti (484GBps vs 484GBps)
RadeonVII vs 2080Ti (1024GBps vs 616GBps)
5700XT vs 2080 (448GBps vs 448GBps)

And now you think AMD can just make a card with 448GBps bandwidth that can compete with a 760GBps card from Nvidia. Keep on dreaming buddy, or play AoTS.

AMD was really hoping Nvidia would name the GA104 as the 3080 just like they did the 2080, but nope Nvidia is serious about burying the next Gen consoles this time around.


----------



## bug (Oct 8, 2020)

Zach_01 said:


> You still do not answer anything. Where exactly are basing the poor asumption that RDNA2 NAVI will compete only with GA104. Based on the 256bit bus?
> 
> 
> Its the too much pessimist thoughts and the presented doomed future of AMD graphics devision, that is making me be part of this discussion.
> ...


Well, if your best argument is Zen, I should point out Zen was a completely new design. RDNA2 is a refresh of something that was good, better than expected even, but ultimately fell short. And it fell short while the competition was severely overpriced.


----------



## gruffi (Oct 9, 2020)

Vayra86 said:


> They'll end up between 3070 and 3080, but they won't fight the 3080 with that bandwidth. Just not happening.


HD 7970: 384 bit, 264 GB/s
GTX 680: 256 bit, 192.3 GB/s

With ~27% less memory bandwidth GTX 680 could compete with HD 7970.

R9 390X: 512 bit, 384.0 GB/s
GTX 980: 256 bit, 224.4 GB/s

With ~42% less memory bandwidth GTX 980 could more than just compete with RX 390X.

RX Vega 64: 2048 bit, 483.8 GB/s
GTX 1080: 256 bit, 320.3 GB/s

With ~34% less memory bandwidth GTX 1080 could compete with RX Vega 64.


You see, bandwidth alone doesn't say much. You have to get the whole picture. RTX 3080 has a bandwidth of 760.3 GB/s. Big Navi is expected to have a bandwidth of >500 GB/s. Which might be something like 30-35% less bandwidth than 3080. But as you can see, you can compete with even such a deficit if your architecture is well optimized.


----------



## Zach_01 (Oct 9, 2020)

bug said:


> Well, if your best argument is Zen, I should point out Zen was a completely new design. *RDNA2 is a refresh* of something that was good, better than expected even, but ultimately fell short. And it fell short while the competition was severely overpriced.


So now you know what kind of architecture the cooked up this round... and that is your argument. Based on what? ..on the bad choices that AMD made in the past?
Well, tell us more and give us the spoils! This is indeed entertaining.


----------



## Vayra86 (Oct 9, 2020)

gruffi said:


> HD 7970: 384 bit, 264 GB/s
> GTX 680: 256 bit, 192.3 GB/s
> 
> With ~27% less memory bandwidth GTX 680 could compete with HD 7970.
> ...



Did you read the news yet? 

Besides, what you're saying is true but AMD doesn't have the Nvidia headstart of better delta compression (980, 1080) at any point in time. Its not optimization, its feature set that made that possible.

Also, the 7970 did age a whole lot better, as in several years of actual practical use out of it, than the 680. Not directly due to bandwidth, but capacity. In none of the three examples is bandwidth the true factor making the difference, really. Nvidia just had a much stronger architecture across the board from Maxwell onwards.


----------



## gruffi (Oct 10, 2020)

Vayra86 said:


> Besides, what you're saying is true but AMD doesn't have the Nvidia headstart of better delta compression (980, 1080) at any point in time. Its not optimization, its feature set that made that possible.


And you know the feature set of RDNA 2? Do you? No, you don't. That's why it's pointless to say it's impossible.



Vayra86 said:


> In none of the three examples is bandwidth the true factor making the difference, really. Nvidia just had a much stronger architecture across the board from Maxwell onwards.


No. It wasn't "a much stronger architecture". In fact AMD had a much stronger architecture until Pascal. At least if we talk about raw performance. Nvidia's architecture was just more optimized for gaming. But it's funny. You say in the examples bandwidth doesn't make a difference. But you already know it will make a difference with RDNA 2. Makes sense.


----------



## Valantar (Oct 10, 2020)

gruffi said:


> And you know the feature set of RDNA 2? Do you? No, you don't. That's why it's pointless to say it's impossible.
> 
> 
> No. It wasn't "a much stronger architecture". In fact AMD had a much stronger architecture until Pascal. At least if we talk about raw performance. Nvidia's architecture was just more optimized for gaming. But it's funny. You say in the examples bandwidth doesn't make a difference. But you already know it will make a difference with RDNA 2. Makes sense.


Sorry, but you're quite off here. Yes, GCN (up to and including Vega) was very strong for pure compute workloads, but it was not very good at translating that into gaming performance. If your main use for a GPU is compute, then that's great, though RDNA will obviously disappoint you as the focus there is on improving gaming performance rather than compute (that's what CDNA is for). You can't _both_ argue that Nvidia didn't have a huge advantage because compute is as important as gaming, and then argue that RDNA improving its gaming performance means it's now better than Nvidia. That's what we call a double standard. Besides, efficiency is also a (major!) factor in the quality of an architecture, and with Maxwell Nvidia took a major step in front of AMD there - and has held that position since. RDNA 1 in combination with TSMC 7nm brought AMD back to rough parity, so it'll be _very_ interesting to see how improved 7nm RDNA 2 vs. 8nm Ampere plays out.

As to your first point, AMD might very well have improved their delta color compression so much that it beats Nvidia's, but if so that wouldn't negate the fact that Nvidia has had the advantage there for four+ generations. That would of course make overtaking them all the more impressive, but your argument has fundamental logical flaws.


----------



## Vayra86 (Oct 10, 2020)

gruffi said:


> And you know the feature set of RDNA 2? Do you? No, you don't. That's why it's pointless to say it's impossible.
> 
> 
> No. It wasn't "a much stronger architecture". In fact AMD had a much stronger architecture until Pascal. At least if we talk about raw performance. Nvidia's architecture was just more optimized for gaming. But it's funny. You say in the examples bandwidth doesn't make a difference. But you already know it will make a difference with RDNA 2. Makes sense.



Nah, reading comprehension buddy, try again. Im being very specific in my response to your examples; no need to pull it out of context. We speak of gaming performance here.

Raw compute is pretty pointless when a competitor dominates the market with optimized CUDA workloads anyway. So even outside of gaming, wtf are you even on about. Spec sheets dont get work done last I checked.


----------



## gruffi (Oct 10, 2020)

Valantar said:


> Sorry, but you're quite off here. Yes, GCN (up to and including Vega) was very strong for pure compute workloads, but it was not very good at translating that into gaming performance.


Isn't that exactly what I said?



Valantar said:


> If your main use for a GPU is compute, then that's great, though RDNA will obviously disappoint you as the focus there is on improving gaming performance rather than compute (that's what CDNA is for). You can't _both_ argue that Nvidia didn't have a huge advantage because compute is as important as gaming, and then argue that RDNA improving its gaming performance means it's now better than Nvidia.


I never argued anything like that.



Vayra86 said:


> Im being very specific in my response to your examples; no need to pull it out of context. We speak of gaming performance here.


I was speaking about gaming performance too. You just weren't very specific. Being "strong" can mean anything. I was specific and clarified what was strong and what was not.

Okay, then how about some facts. You said RDNA 2 won't fight the 3080 with that bandwidth. Give us some facts about RDNA 2 why it won't happen. No opinions, no referring to old GCN stuff, just hard facts about RDNA.


----------



## Valantar (Oct 10, 2020)

gruffi said:


> Isn't that exactly what I said?


No. What you said was


gruffi said:


> In fact AMD had a much stronger architecture until Pascal. At least if we talk about raw performance. Nvidia's architecture was just more optimized for gaming.


That is, quite literally, turning what I said on its head. This is a forum for computer enthusiasts. While there are of course quite a few enthusiasts who have a lot of use for pure compute in what they use their computers for, the _vast_ majority need their GPUs for gaming. Consumer/enthusiast GPUs are also explicitly designed around gaming, not compute. As such, saying that AMD had the better architecture because they delivered more FP32 even if they were lagging in gaming performance is turning things very much on their head. It might be that this doesn't apply to you, but from what I've seen you haven't stated as much, so I have to base my interpretations on what is generally true on forums like these.

Besides that, you're still ignoring efficiency. Let's start back in 2013:
Radeon 290X: $550, 5.6TFlops, 438mm² (12.8Gflops/mm²), 290W (19.3Gflops/W), 100% gaming performance.
Geforce GTX 780 Ti: $699, 5.3Tflops, 561mm ² (9.4Gflops/mm²), 250W (21.2Gflops/W), 104% gaming performance

Radeon Fury X: $699, 8.6TFlops, 596mm² (14.4Gf/mm²), 275W (31.3Gflops/W), 131% gaming performance
Geforce GTX 980 Ti: $699, 6.1Tflops, 601mm² (10.1Gf/mm²), 250W (24,4Gflops/W), 133% gaming performance

Radeon Vega 64: $499, 12.7Tflops, 495mm² (25.7Gf/mm²), 295W (43.1Gflops/W), 173% gaming performance
Geforce GTX 1080 Ti: $699, 11.3Tflops, 471mm² (24Gf/mm²), 250W (45.2Gflops/W), 223% gaming performance.

So, what was AMD good at? Delivering FP32 compute for cheap (compared to Nvidia). For some generations they kept pace in terms of gaming performance too, but always at the cost of higher power, and in the Fury X (still using mine!) and onwards that's partially thanks to exotic and expensive memory that's dramatically more efficient than GDDR. They also delivered quite good compute per die area. In gaming they kept up at best, lagged behind dramatically at worst (though then also at a lower price).

What can we extrapolate from this? That GCN was a good architecture _for compute_. It was very clearly a worse architecture than what Nvidia had to offer overall, as compute is _not_ the major relevant use case for any consumer GPU. So, in any perspective other than that of someone running a render farm, AMD's architecture was clearly worse than Nvidia's.

This is very clearly demonstrated by RDNA: The 5700 XT matches the Radeon VII in gaming performance despite a significant drop in compute performance. It also dramatically increases gaming performance/W, though compute/W is _down_ from the VII.



gruffi said:


> I never argued anything like that.


But you did. You said AMD had a "much stronger architecture" until Pascal. Which means that you're arguing that compute is more important than gaming performance, as that is the only metric in which they were better. Yet you're in a discussion about whether RDNA 2 can match or beat Ampere _in gaming performance_ based on rumored memory bandwidths, arguing against someone skeptical of this. While we know that RDNA is a _worse_ architecture for compute than GCN, watt for watt on the same node, and you're arguing for RDNA 2 likely being very good - which implies that more gaming performance = better. So whether you meant there to be or not, there is a distinct reference point shift between those two parts of your arguments.


----------



## Nkd (Oct 11, 2020)

ShurikN said:


> Either Big Navi is not high end (hence 256-bit bus), and was never meant to compete with GA102,
> OR
> it is high end and has some sort of hidden mumbo-jumbo, in this case Infinity Cache (aka very large cache) to offset the bandwidth.
> 
> Do you ppl really think AMD (it's engineers) went and made a 3080 competitor and then one day sat at a table and went "You know what this bad boy needs, a crippled memory bus. Let us go fuck this chip up so much that no one will ever buy it". And then everyone clapped and popped champagne bottles and ate caviar, confetti was flying, strippers came and everything.



well rumor is they did test big navi with 384bit bus and with cache and 256bit bus. Looks like the difference wasn't enough to justify 384bit bus that will add to the process and make it more expensive. So what they have must be sufficient.


----------



## gruffi (Oct 11, 2020)

Valantar said:


> No. What you said was


Which in fact is absolutely the same statement. I really don't know what you are reading here.



Valantar said:


> As such, saying that AMD had the better architecture because they delivered more FP32 ...


Again, I never said anything like that. Please read what I said I don't make up things. Where did I write something about "better architecture"? The topic was "stronger architecture". And in terms of *raw performance* AMD had a stronger architecture until Pascal. That's what I said. You just repeated it. I never said AMD's architecture was stronger (or better) at gaming.



Valantar said:


> Besides that, you're still ignoring efficiency.


No, I don't ignoring it. It just wasn't the topic.



Valantar said:


> You said AMD had a "much stronger architecture" until Pascal.


Yes. But you should read my whole statement and not just one sentence. I said "if we talk about raw performance". And that's true. I never said AMD had a "much stronger architecture for gaming". That's just what you read. But I didn't say it. So, please accept your mistake and don't make it up even more.

Let me sum it up for you again. That's what I said:


> AMD had a much stronger architecture ... if we talk about raw performance ... Nvidia's architecture was just more optimized for gaming


And that's what you said


> GCN ... was very strong for pure compute workloads, but it was not very good at translating that into gaming performance.


Which is the very same statement. Just expressed in other words.


----------



## Valantar (Oct 11, 2020)

gruffi said:


> Which in fact is absolutely the same statement. I really don't know what you are reading here.
> 
> 
> Again, I never said anything like that. Please read what I said I don't make up things. Where did I write something about "better architecture"? The topic was "stronger architecture". And in terms of *raw performance* AMD had a stronger architecture until Pascal. That's what I said. You just repeated it. I never said AMD's architecture was stronger (or better) at gaming.
> ...


This is getting repetitive, but again: no.

You are presenting an argument from a point of view where the "strength" of a GPU architecture is apparently _only_ a product of its FP32 compute prowess. I am presenting a counterargument saying that this is a meaningless measure for home/enthusiast uses, both due to your argument ignoring efficiency (which is _always_ relevant when discussing an architecture, as better efficiency = more performance in a given power envelope) and due to FP32 compute being of relatively low importance to this user group. You are also for some reason equating FP32 compute to "raw performance", which is a stretch given the many tasks a GPU can perform. FP32 is of course one of the more important ones, but it alone is a poor measure of the performance of a GPU, particularly outside of enterprise use cases.

Put more simply: you are effectively saying "GCN was a good architecture, but bad at gaming" while I am saying "GCN was a mediocre architecture, but good at compute." The point of reference and meaning put into what amounts to a good architecture in those two statements are dramatically different. As for saying "strong" rather than "good" or whatever else: these are generic terms without specific meanings in this context. Trying to add a post-hoc definition doesn't make the argument any more convincing.


----------



## bug (Oct 11, 2020)

Zach_01 said:


> So now you know what kind of architecture the cooked up this round... and that is your argument. Based on what? ..on the bad choices that AMD made in the past?
> Well, tell us more and give us the spoils! This is indeed entertaining.


Based on the fact that no architecture is built for a single generation. And it's in the name RDNA*2*.


----------



## Vayra86 (Oct 11, 2020)

gruffi said:


> Isn't that exactly what I said?
> 
> 
> I never argued anything like that.
> ...



Read back or on other topics, been over this at length already.


----------



## Zach_01 (Oct 11, 2020)

bug said:


> Based on the fact that no architecture is built for a single generation. And it's in the name RDNA*2*.


Architectures change, evolve, enhance and modify... And we dont really know what AMD done this round.

ZEN3 architecture is all ZEN... Started with ZEN >> ZEN+ >>ZEN2 continuously improving and yet again they manage to enhance it on the exact same node and improve IPC and performance per watt all together. RDNA2 is just a step back (=ZEN2) and it will bring impovements. RDNA3 will prabably be like a ZEN3 iteration.


----------



## gruffi (Oct 11, 2020)

Valantar said:


> You are presenting an argument from a point of view where the "strength" of a GPU architecture is apparently _only_ a product of its FP32 compute prowess.


No. I never said that's the "only" factor. But it's very common to express the capability of such chips in FLOPS. AMD does it, Nvidia does it, every supercomputer does. You claimed I was off. And that's simply wrong. We should all know that actual performance depends on other factors as well, like workload or efficiency.



Valantar said:


> Put more simply: you are effectively saying "GCN was a good architecture, but bad at gaming"


No. I said what I said. I never categorized anything as good or bad. That was just you. But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris Pascal. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.




Vayra86 said:


> Read back or on other topics, been over this at length already.


How about answering my question first? I'm still missing that one.


gruffi said:


> Okay, then how about some facts. You said RDNA 2 won't fight the 3080 with that bandwidth. Give us some facts about RDNA 2 why it won't happen. No opinions, no referring to old GCN stuff, just hard facts about RDNA.


----------



## Vayra86 (Oct 11, 2020)

gruffi said:


> How about answering my question first? I'm still missing that one.



That was the answer to your question 



gruffi said:


> No. I said what I said. I never categorized anything as good or bad. That was just you. But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.



First update was Tonga (R9 285 was it?) and it failed miserably, then they tried Fury X. Then came Polaris.

None of it was a serious move towards anything with a future, it was clearly grasping at straws as Hawaii XT already ran into the limits of what GCN could push ahead. They had a memory efficiency issue. Nvidia eclipsed that entirely with the release of Maxwell's Delta Compression tech which AMD at the time didn't have. Polaris didn't either so its questionable what use that 'update' really was. All Polaris really was, was a shrink from 22 > 14nm and an attempt to get some semblance of a cost effective GPU in the midrange. Other development was stalled and redirected to more compute (Vega) and pro markets because 'that's where the money is', while similarly the midrange 'is where the money is'. Then came mining... and it drove 90% of Polaris sales, I reckon. People still bought 1060's and 970's regardless, not in the least because those were actually available.

Current trend in GPUs... Jon Peddies reports year over year a steady growth (relatively) wrt high end GPUs and the average price is steadily rising. Its a strange question to ask me what undisclosed facts RDNA2 will bring to change the current state of things, but its a bit of a stretch to 'assume' they will suddenly leap ahead as some predict. The supposed specs we DO have, show about 500GB/s in bandwidth and that is a pretty hard limit, and apparently they do have some sort of cache system that does something for that as well, seeing the results. If the GPU we saw in AMD"s benches was the 500GB/s one, the cache is good for another 20%. Nice. But it still won't eclipse a 3080. This means they will need a wider bus for anything bigger; and this will in turn take a toll on TDPs and efficiency.

The first numbers are in and we've already seen about a 10% deficit to the 3080 with whatever that was supposed to be. There is probably some tier above it, but I reckon it will be minor like the 3090 above a 3080 is. As for right decisions... yes, retargeting the high end is a good decision, its the ONLY decision really and I hope they can make it happen, but the track record for RDNA so far isn't spotless, if not just plagued with very similar problems to what GCN had, up until now.

@gruffi sry, big ninja edit, I think you deserved it for pressing the question after all


----------



## DeathtoGnomes (Oct 12, 2020)

Valantar said:


> This is a forum* for* computer enthusiasts.


stop spreading false rumors!!


----------



## InVasMani (Oct 12, 2020)

Now here I thought this was a forum to learn about basket weaving 101 OOP!


----------



## Valantar (Oct 12, 2020)

gruffi said:


> No. I never said that's the "only" factor. But it's very common to express the capability of such chips in FLOPS. AMD does it, Nvidia does it, every supercomputer does. You claimed I was off. And that's simply wrong. We should all know that actual performance depends on other factors as well, like workload or efficiency.


And camera manufacturers _still_ market megapixels as if it's a meaningful indication of image quality. Should we accept misleading marketing terms just because they are common? Obviously not. The problems with using teraflops as an indicator of consumer GPU performance have been discussed at length both in forums like these and the media. As for supercomputers: that's one of the relatively few cases where teraflops actually matter, as supercomputers run complex _compute_ workloads. Though arguably FP64 is likely more important to them than FP32. But for anyone outside of a datacenter? There are far  more important metrics than the base teraflops of FP32 that a GPU can deliver.


gruffi said:


> No. I said what I said. I never categorized anything as good or bad. That was just you.


Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the _only_ reasonable interpretation of that word in this context.


gruffi said:


> But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.


I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN. It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall. It could be said to be a good architecture for compute, hence its use as the basis for CDNA, but for more general workloads it simply scales poorly.



DeathtoGnomes said:


> stop spreading false rumors!!


Sorry, my bad. I should have said "This is a forum for RGB enthusiasts."


----------



## BoboOOZ (Oct 12, 2020)

Valantar said:


> Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the _only_ reasonable interpretation of that word in this context.


Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
Better waste this energy with more meaningful discussions.

Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To

Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).


----------



## Valantar (Oct 12, 2020)

BoboOOZ said:


> Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
> Better waste this energy with more meaningful discussions.
> 
> Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To
> ...


Sorry, I might be pedantic, but I can't agree with this. Firstly, the meaning of "strong" is obviously dependent on context, and in this context (consumer gaming GPUs) the major relevant form of "strength" is gaming performance. Attributing FP32 compute performance as a more relevant reading of "strong" in a consumer GPU lineup needs some actual arguments to back it up. I have so far not seen a single one.

Your second statement is the worst type of misleading: something that is technically true, but is presented in a way that vastly understates the importance of context, rendering its truthfulness moot. "They're fine if they're understood as what they are" is entirely the point here: FP32 is in no way whatsoever a meaningful measure of consumer GPU performance across architectures. Is it a reasonable point of comparison within the same architecture? Kind of! For non-consumer uses, where pure FP32 compute is actually relevant? Sure (though it is still highly dependent on the workload). But for the vast majority of end users, let alone the people on these forums, FP32 as a measure of the performance of a GPU is very, very misleading.

Just as an example, here's a selection of GPUs and their game performance/Tflop in TPU's test suite at 1440p from the 3090 Strix OC review:

Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF

Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF

RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF 
RX 5500 XT 27% 5.2TF = 5.2 perf/TF

GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF

Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF 

This is of course at just one resolution, and the numbers would change at other resolutions. The point still shines through: even within the same architectures, using the same memory technology, gaming performance per teraflop of FP32 compute can vary by 25% or more. Across architectures we see more than 100% variance. Which demonstrates that for the average user, FP32 is _an utterly meaningless metric_. Going by these numbers, a 20TF GPU might beat the 3090 (if it matched the 2060 in performance/TF) or it might lag dramatically (like the VII or Ampere).

Unless you are a server admin or researcher or whatever else running workloads that are mostly FP32, using FP32 as a meaningful measure of performance is _very_ misleading. Its use is _very_ similar to how camera manufacturers have used (and partially still do) megapixels as a stand-in tech spec to represent image quality. There is _some_ relation between the two, but it is wildly complex and inherently non-linear, making the one meaningless as a metric for the other.


----------



## gruffi (Oct 12, 2020)

Valantar said:


> Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the _only_ reasonable interpretation of that word in this context.


I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².



Valantar said:


> I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN.


You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.



Valantar said:


> It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall.


I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.



Vayra86 said:


> That was the answer to your question


Okay. Than we can agree that it _could_ be possible to be competitive or at least very close in performance even with less memory bandwidth?


----------



## Vayra86 (Oct 12, 2020)

gruffi said:


> Okay. Than we can agree that it _could_ be possible to be competitive or at least very close in performance even with less memory bandwidth?



Could as in highly unlikely, yes.


----------



## londiste (Oct 12, 2020)

Valantar said:


> Ampere:
> 3090 (Strix OC) 100% 39TF = 2.56 perf/TF
> 3080 90% 29.8TF = 3 perf/TF
> 
> ...


At 1440p Ampere probably gets more of a hit than it should.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.

Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)

Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)

RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz

GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)

Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)

I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.

```
GPU        TFLOP 1440p %/TF  2160p %/TF
3090       40.3  100%  2.48  100%  2.48
3080       33.6   90%  2.68   84%  2.5

2080Ti     15.9   72%  4.53   64%  4.02
2070S       9.2   55%  5.98   46%  5.00
2060        7.1   41%  5.77   34%  4.79

RX5700XT    9.66  51%  5.28   42%  4.35
RX5600XT*   9.0   40%  5.00   33%  4.12
RX5500XT    5.1   27%  5.29   19%  3.72

Radeon VII 13.6   53%  3.90   46%  3.38
Vega64*    12.7   41%  3.23   34%  2.68
RX590       7.1   29%  4.08   24%  3.38

1080Ti     12.7   53%  4.17   45%  3.54
1070        6.9   34%  4.93   28%  4.06
```

- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.

As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?


----------



## Valantar (Oct 12, 2020)

gruffi said:


> I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².


Skipping the hilarity of (unintentionally, I presume) suggesting that reading without interpretation is possible, I have explained the reasons for my interpretation at length, and why it to me is a much more reasonable assumption of what a "strong" GPU architecture means in the consumer space. You clearly worded your statement vaguely and ended up saying something different from what you meant. (For the record: even calling FP32 "raw performance" is a stretch - it's the _main_ performance metric of modern GPUs, but still one among at least a couple dozen relevant ones, all of which affect various workloads in different ways. Hence me arguing for why it alone is a poor indication of anything except performance in pure FP32 workloads. It's kind of like discussing which minivan is the best solely based on engine horsepower, while ignoring the number and quality of seats, doors, build quality, reliability, ride comfort, etc.) You're welcome to disagree with this, but so far your arguments for your side of this discussion have been unconvincing at best.


gruffi said:


> You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.


Uhm ... updating the ISA is a change to the architecture. Beyond that, AMD kept talking about various low-level architectural changes to GCN for each revision - beyond what is published; after all, published information doesn't really go beyond block diagram levels - but these never really materialized as performance or efficiency improvements. You're right that the move to HBM didn't change the architecture, as the memory controllers generally aren't seen as part of the GPU architecture. Of course the _main _bottleneck for GCN was its 64 CU limit, which forced AMD to release the V64 at idiotic clocks to even remotely compete in absolute performance, but made the architecture look _terrible _for efficiency at the same time. A low-clocked Vega 64 is actually quite efficient, after all, and show that if AMD could have made a medium-clocked 80CU Vega card, they could have been in a much better competitive position (though at some cost due to the large die). That limitation alone is likely both the main reason for AMD's GPU woes and their choice of replacing GCN entirely - they had no other choice. But even with limited resources, they had more than half a decade to improve GCN architecturally, and managed pretty much nothing. Luckily with RDNA they've both removed the 64 CU limit _and_ improved perf/TF dramatically, with promises of more to come.



gruffi said:


> I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.


That's possible, but unlikely. The enterprise compute market is of course massively lucrative, but AMD didn't design GCN as a datacenter compute-first core. It was a _graphics_ core design meant to replace VLIW, but it also happened to be very good at pure FP32. Call it a lucky side effect. At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market. RDNA is clearly a gaming-oriented architecture, with CDNA being split off (and reportedly being much closer to GCN in design) for compute work, but that doesn't mean that GCN wasn't initially designed for gaming. 



londiste said:


> At 1440p Ampere probably gets more of a hit than it should.
> But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
> At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.
> 
> ...


I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all  - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.


----------



## londiste (Oct 12, 2020)

Valantar said:


> I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all  - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.


Variations are probably down to relative amount of other aspects of the card - memory bandwidth, TMUs, ROPs. Trying not to  go down that rabbit hole right now. It didn't take me quite 45 minutes to put that one together but it wasn't too far off


----------



## gruffi (Oct 13, 2020)

Valantar said:


> You clearly worded your statement vaguely


I was very clear about that I was just talking about raw performance as a simple fact. And not if something is considered to be good or bad based on that. Maybe next time if _you_ are unsure about the meaning of other's words ask first to clear things up.  Your aggressive and bossy answers to put words in my mouth I never said is a very impolite and immature way of having a conversation.



Valantar said:


> updating the ISA is a change to the architecture.


But it doesn't make the architecture faster or more efficient considering general performance. That's the important point. Or do you think adding AVX512 to Comet Lake would make it better for your daily tasks? Not at all.



Valantar said:


> AMD didn't design GCN as a datacenter compute-first core. It was a _graphics_ core design


In fact it was _no_ graphics core design. Look up the press material that AMD published back then. You can read statements like "efficient and scalable architecture optimized for graphics and parallel compute" or "cutting-edge gaming and compute performance". GCN clearly was designed as a hybrid, an architecture to be equally good at gaming _and_ compute. But I think the focus was more on improving compute performance. Because that was the philosophy of the AMD staff at that time. They wanted to be more competitive in professional markets. Bulldozer was designed with the same focus in mind. Lisa Su changed that. Nowadays AMD focuses more on client markets again.



Valantar said:


> but it also happened to be very good at pure FP32. Call it a lucky side effect.


That naivety is almost funny. Nothing happens as a side effect during years of development. It was on purpose.



Valantar said:


> At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market.


AMD had GPGPU software solutions before OpenCL and even before CUDA. The first was the CTM (Close To The Metal) interface. Later it was replaced by the Stream SDK. All those developments happened in the later 00's. Likely when GCN was in its design phase. It's obvious that AMD wanted a performant compute architecture to be prepared for future GPGPU environments. That doesn't mean GCN was a compute only architecture. Again, I didn't say that. But compute performance seemed to be at least as equally important as graphics performance.


----------



## InVasMani (Oct 14, 2020)

Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.


----------



## londiste (Oct 14, 2020)

InVasMani said:


> Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.


FPGA is incredibly inefficient compared to fixed hardware.


----------



## Bronan (Oct 16, 2020)

I actually do not care at all about all the hyped news
For me the most important is that the card seems to be power efficient and thats more important than the power hungry , super heater for your home nvidia solution. Imagine living in spain and italy with temps above 40c and then not being able to play a game because your silly machine gets overheated by your o so precious nvidia card 



bug said:


> Ok, who the hell calls Navi2 "Big Navi"?
> Big Navi was a pipe dream of AMD loyalists left wanting for a first gen Navi high-end card.


This quote " Something Big is coming is not a lie" because its going to be a big card, they have not said anything about performance the only thing they talk about is a more efficient product. That most people translate that to faster than nvidia is their own vision.
But if it does beat the 3070 then i will consider buying it even though its not such a big step upwards from my current 5700XT which runs darn well.

I really wish that they introduce the AMD Quantum mini pc which was showed at the E3 2015 with current hardware or something similar.
Because i want my systems to be smaller without having to limit the performance too much, i am pretty sure the current hardware could be more than capable to create such a mini pc by now with enough performance.


----------

