# AMD Radeon VII Detailed Some More: Die-size, Secret-sauce, Ray-tracing, and More



## btarunr (Jan 11, 2019)

AMD pulled off a surprise at its CES 2019 keynote address, with the announcement of the Radeon VII client-segment graphics card targeted at gamers. We went hands-on with the card earlier this week. The company revealed a few more technical details of the card in its press-deck for the card. To begin with, the company talks about the immediate dividends of switching from 14 nm to 7 nm, with a reduction in die-size from 495 mm² on the "Vega 10" silicon to 331 mm² on the new "Vega 20" silicon. The company has reworked the die to feature a 4096-bit wide HBM2 memory interface, the "Vega 20" MCM now features four 32 Gbit HBM2 memory stacks, which make up the card's 16 GB of memory. The memory clock has been dialed up to 1000 MHz from 945 MHz on the RX Vega 64, which when coupled with the doubled bus-width, works out to a phenomenal 1 TB/s memory bandwidth. 

We know from AMD's late-2018 announcement of the Radeon Instinct MI60 machine-learning accelerator based on the same silicon that "Vega 20" features a total of 64 NGCUs (next-generation compute units). To carve out the Radeon VII, AMD disabled 4 of these, resulting in an NGCU count of 60, which is halfway between the RX Vega 56 and RX Vega 64, resulting in a stream-processor count of 3,840. The reduced NGCU count could help AMD harvest the TSMC-built 7 nm GPU die better. AMD is attempting to make up the vast 44 percent performance gap between the RX Vega 64 and the GeForce RTX 2080 with a combination of factors. 



 

 

 




First, AMD appears to be maximizing the clock-speed headroom achieved from the switch to 7 nm. The Radeon VII can boost its engine clock all the way up to 1800 MHz, which may not seem significantly higher than the on-paper 1545 MHz boost frequency of the RX Vega 64, but the Radeon VII probably sustains its boost frequencies better. Second, the slide showing the competitive performance of Radeon VII against the RTX 2080 pins its highest performance gains over the NVIDIA rival in the "Vulkan" title "Strange Brigade," which is known to heavily leverage asynchronous-compute. AMD continues to have a technological upper-hand over NVIDIA in this area. AMD mentions "enhanced" asynchronous-compute for the Radeon VII, which means the company may have improved the ACEs (async-compute engines) on the "Vega 20" silicon, specialized hardware that schedule async-compute workloads among the NGCUs. With its given specs, the Radeon VII has a maximum FP32 throughput of 13.8 TFLOP/s

The third and most obvious area of improvement is memory. The "Vega 20" silicon is lavishly endowed with 16 GB of "high-bandwidth cache" memory, which thanks to the doubling in bus-width and increased memory clocks, results in 1 TB/s of memory bandwidth. Such high physical bandwidth could, in theory, allow AMD's designers to get rid of memory compression which probably frees up some of the GPU's number-crunching resources. The memory size also helps. AMD is once again throwing brute bandwidth to overcome any memory-management issues its architecture may have.



 

The Radeon VII is being extensively marketed as a competitor to GeForce RTX 2080. NVIDIA holds a competitive edge with its hardware being DirectX Raytracing (DXR) ready, and even integrated specialized components called RT cores into its "Turing" GPUs. The "Vega 20" continues to lack such components, however AMD CEO Dr. Lisa Su confirmed at her post-keynote press round-table that the company is working on ray-tracing. "I think ray tracing is important technology; it's something that we're working on as well, from both a hardware/software standpoint." 

Responding to a specific question by a reporter on whether AMD has ray-tracing technology, Dr. Su said: "I'm not going to get into a tit for tat, that's just not my style. So I'll tell you that. What I will say is ray tracing is an important technology. It's one of the important technologies; there are lots of other important technologies and you will hear more about what we're doing with ray tracing. You know, we certainly have a lot going on, both hardware and software, as we bring up that entire ecosystem."

One way of reading between the lines would be - and this is speculation on our part - that AMD could working on retrofitting some of its GPUs powerful enough to handle raytracing with DXR support through a future driver update, as well as working on future generations of GPUs with hardware-acceleration for many of the tasks that are required to get hybrid rasterization work (adding real-time raytraced objects to rasterized 3D scenes). Just as real-time raytracing is technically possible on "Pascal" even if daunting on the hardware, with good enough work directed at getting a ray-tracing model to work on NGCUs leveraging async-compute, some semblance of GPU-accelerated real-time ray-tracing compatible with DXR could probably be achieved. This is not a part of the feature-set of Radeon VII at launch. 

The Radeon VII will be available from 7th February, priced at $699, which is on-par with the SEP of the RTX 2080, despite the lack of real-time raytracing (at least at launch). AMD could shepherd its developer-relations on future titles being increasingly reliant on asynchronous compute, the "Vulkan" API, and other technologies its hardware is good at.

*View at TechPowerUp Main Site*


----------



## Pixrazor (Jan 11, 2019)

still 64 ROPs damn...
so just Vega but at 7nm and more hbm2 to boost up the price.
where is Navi??


----------



## Zubasa (Jan 11, 2019)

Pixrazor said:


> still 64 ROPs damn...
> so just Vega but at 7nm and more hbm2 to boost up the price.
> where is Navi??


Well the Pixie Dust was good while it lasted.
My Vega 56 with the LC Bios on a custom loop does 1780Mhz, so yeah I guess you can have that 16GB of HBM2.

The EVGA 2080ti Black went up $100 last night, I guess thats something too, should have gotten one $999, what a "steal".
What AMD showed was better marketing for nVidia then what even nVidia can come up with. 
"AMD used CEO in leather jacket! It's super effective!"


----------



## Xzibit (Jan 11, 2019)

Some ones reporting it wrong.  Maybe people are looking at TPUs database which says ROPs 60.



			
				ArsTechnica said:
			
		

> *The new chip has 128 ROPs to the old chip's 64*, doubling the number of rendered rasterized pixels it can produce





			
				AnandTech said:
			
		

> Instead, the biggest difference between the two cards is on the memory/ROP backend. Radeon Vega 64 (Vega 10) featured 64 ROPs and 2 HBM2 memory channels running at 1.89Gbps each, for a total of 484GB/sec of memory bandwidth. *Radeon VII (Vega 20) doubles this and then some to 128 ROPs *and 4 HBM2 memory channels, which also means memory capacity has doubled to 16GB. And then there’s the clockspeed boost on top of this: 1800MHz for the ROPs, and 2.0Gbps for the HBM2 memory. As a result Radeon VII has a lot more pixel pushing power, and a lot more in the way of resources to feed it to get there. Given these changes and AMD’s performance estimates, I think this lends a lot of evidence to the idea that Vega 10 was unbalanced – it needed more ROPs and/or more memory bandwidth to feed it – but that’s something we’ll save for the eventual review.





			
				Toms Hardware said:
			
		

> *ROPs 128*


----------



## cucker tarlson (Jan 11, 2019)

AMD wants to close the 45% gap with async. Well that sounds optimistic for those who will buy RVII instead of 2080.
In those 5 titles that support it ? I thought even AMD themselves let the async shader fad die on its own,almost no one has implemented it for three years it's been out.
Turning on RT on more powerful cards would be poinless if they're not hardware accelerated for RT. 2070 has 60TFlops of RT performance and it barely copes.


----------



## Xzibit (Jan 11, 2019)

cucker tarlson said:


> AMD wants to close the 45% gap with async. Well that sounds optimistic for those who will buy RVII instead of 2080.
> In those 5 titles that support it ? I thought even AMD themselves let the async shader fad die on its own,almost no one has implemented it for three years it's been out.
> Turning on RT on more powerful cards would be poinless if they're not hardware accelerated for RT. *2070 has 60TFlops of RT performance and it barely copes.*



Turing arc would scale badly.  its a 1:1 SM to RT ratio. It already big as it is.  They have to improve the ratio next time around.


----------



## Zubasa (Jan 11, 2019)

cucker tarlson said:


> AMD wants to close the 45% gap with async. Well that sounds optimistic for those who will buy RVII instead of 2080.
> In those 5 titles that support it ? I thought even AMD themselves let the async shader fad die on its own,almost no one has implemented it for three years it's been out.
> Turning on RT on more powerful cards would be poinless if they're not hardware accelerated for RT. 2070 has 60TFlops of RT performance and it barely copes.


You know whats the best thing about Async-Compute? nVidia supports it properly since Volta.
Turing does everything this does and some more.

As much as Jensen Huang is an ass-hat, what he said has some truth in it.


----------



## Camm (Jan 11, 2019)

cucker tarlson said:


> Turning on RT on more powerful cards would be poinless if they're not hardware accelerated for RT. 2070 has 60TFlops of RT performance and it barely copes.



It should be noted that Nvidia has a huge ass achilles heel with the RTX series - that RT operations are INT based, and that the card needs to flush to switch between FP and INT operations.

Dedicated Hardware acceleration for RT is a smokescreen IMO, the key is if you can cut down your FP or INT instructions as small as possible and run as many as parallel as possible. AMD does have some FP division capability so its possible that some cards can be retrofitted for RT.


----------



## Nkd (Jan 11, 2019)

cucker tarlson said:


> AMD wants to close the 45% gap with async. Well that sounds optimistic for those who will buy RVII instead of 2080.
> In those 5 titles that support it ? I thought even AMD themselves let the async shader fad die on its own,almost no one has implemented it for three years it's been out.
> Turning on RT on more powerful cards would be poinless if they're not hardware accelerated for RT. 2070 has 60TFlops of RT performance and it barely copes.



Who said async compute was dead. It was primitive shader and dsbr that never worked on vega, not async compute. I think you are confused here. Never ever AMD said async compute was not supported or dead.


----------



## Brusfantomet (Jan 11, 2019)

Are the 64 ROPs conformed? becaue  Anandtech  has that number at 128.

At 64 ROPs per card the Raster power of the card is comparable to a 295X when CF is working (a 4 year and 9 month old card)


----------



## Nkd (Jan 11, 2019)

Pixrazor said:


> still 64 ROPs damn...
> so just Vega but at 7nm and more hbm2 to boost up the price.
> where is Navi??



you want something true high end  wait until next year. or you are just setting yourself up for disappointment. Everyone knew before the announcement, reddit, forums. They expected 2080 peformance and no better and were okay lol.

Navi may not be a high end but really good mid range. So just letting you know don't be disappointed. AMD has more to gain from a new architecture so wait until that to see good things.



cucker tarlson said:


> read that again.I said it's supported in a handful of games only,not that's it's not functional
> 
> that 12% lead over 2080 in strange brigade vulkan is amd cherry picking testing methodology too.
> 
> ...



Got it. AMD cant have features that boost performance in games. Only Nvidia can. So AMD users should turn off async compute in games because Nvidia cant do it as good and turn off vulkan. Every company is going to show off the best their hardware can do. Whether you like it or not. WHat you are saying is one sided talk. Plus there are more benchmark numbers out there for vega 2 you just need to look.



Brusfantomet said:


> Are the 64 ROPs conformed? becaue  Anandtech  has that number at 128.
> 
> At 64 ROPs per card the Raster power of the card is comparable to a 295X when CF is working (a 4 year and 9 month old card)



Honestly if they got this performance with 64 ROPs that is even more impressive. they still managed to squeeze 25-30% more performance out of Vega from same old GCN with a shrink.


----------



## btarunr (Jan 11, 2019)

Xzibit said:


> Some ones reporting it wrong



There is no technical document from AMD or statement from any AMD spokesperson that stated 128, despite Vega20 being out since Q4-2018. Anandtech assumed 128 because memory bus width has doubled (while conveniently ignoring that the 4096-bit Fiji/Fury too had 64 ROPs). Other sites picked it up from them. We (and some other site editors) reached out to AMD to confirm ROP count in the meantime.

This is the only Vega20 block-diagram available for now (picked up from MI60 slides):







I only see four pixel engines per pipeline, 16 in all, which do 4 pixels/clock, working out to 64 ROPs.


----------



## cucker tarlson (Jan 11, 2019)

Nkd said:


> Got it. AMD cant have features that boost performance in games. Only Nvidia can. So AMD users should turn off async compute in games because Nvidia cant do it as good and turn off vulkan. Every company is going to show off the best their hardware can do. Whether you like it or not. WHat you are saying is one sided talk. Plus there are more benchmark numbers out there for vega 2 you just need to look.


no you did not get it. they're comparing best case scenario for radeons vs worst case scenario for nvidia. same as some rviews are comparing games like deus ex md in dx12 mode where vega wins not even mentioning that 1080 on dx11 is faster than vega dx12.

anyway,my point was wrong in the first place since I looked at v64 vs 1080 only, the same chart shows turing cards seem to get better in vulkan+async on mode so let's not argue about something I was wrong about in the first place.


----------



## Xzibit (Jan 11, 2019)

btarunr said:


> There is no technical document from AMD or statement from any AMD spokesperson that stated 128, despite Vega20 being out since Q4-2018. Anandtech assumed 128 because memory bus width has doubled (while conveniently ignoring that the 4096-bit Hawaii/Fury too had 64 ROPs). Other sites picked it up from them. We (and some other site editors) reached out to AMD to confirm ROP count in the meantime.



Did you get any confirmation on the FP64 1/2 or 1/32.  Database says 1/2. Thats equal to Titan V.  Hard to imagine they canabalizing their MI60/50 with a $699 part.


----------



## cucker tarlson (Jan 11, 2019)

isn't rop performance tied to memory bandwidth in some way ? even if they cust the sp count and did not improve clocks by much they might've gained a lot with 1tb/s


----------



## Brusfantomet (Jan 11, 2019)

cucker tarlson said:


> isn't rop performance tied to memory bandwidth in some way ? even if they cust the sp count and did not improve clocks by much they might've gained a lot with 1tb/s



According to* btarunr*s post the ROPs are tied to the Graphics pipeline, on Nv chips its tied to the memory controllers, so it is probably 64 ROPs


----------



## W1zzard (Jan 11, 2019)

btarunr said:


> We (and some other site editors) reached out to AMD to confirm ROP count in the meantime.


Just to clarify on that, we're still waiting for response from AMD.


----------



## Xzibit (Jan 11, 2019)

Here is a better look


----------



## Deleted member 172152 (Jan 11, 2019)

Why is Radeon VII only 7.5% faster in hitman 2? 

Makes me feel better about AMD's performance numbers at least! Suppose they left it in to counteract the insane spikes up. Dunno, weird they left that one in.


----------



## Apocalypsee (Jan 11, 2019)

I highly doubt it have 128ROPs. If it did have 64 ROPs then old Vega56/64 are memory bandwidth starved. Then again, all AMD recent GPU are bandwidth starved for example RX470 that uses the same memory as RX480 performs very close to it. GCN is reaching its limits, its good for compute but not as a gaming card. They need to put more than 4 Shader Engines, which in return increase geometry units and number of ROPs.


----------



## Zubasa (Jan 11, 2019)

Xzibit said:


> Did you get any confirmation on the FP64 1/2 or 1/32.  Database says 1/2. Thats equal to Titan V.  Hard to imagine they canabalizing their MI60/50 with a $699 part.


The MI50 and 60 are not display cards in the sense that they have no display output at all.
If this has the full FP64 performance, this card might indeed has some merit, as a Vega FE replacement.
A much cheaper Radeon Pro without all the proper certifications, that is if this card has access to the Pro drivers and ROCm etc.


----------



## btarunr (Jan 11, 2019)

Zubasa said:


> The MI50 and 60 are not display card in the sense that they have no display output at all.



They have one mini-DP for diagnostic purposes.


----------



## IceShroom (Jan 11, 2019)

Xzibit said:


> Did you get any confirmation on the FP64 1/2 or 1/32.  Database says 1/2. Thats equal to Titan V.  Hard to imagine they canabalizing their MI60/50 with a $699 part.


AMD usually have FP64 1/16 of FP32 on consumer card for last 3 generation. AFAIK Hawaii has FP64 1/8 of FP32.


----------



## Jism (Jan 11, 2019)

It was'nt a simple die-shrink just alone if the thing suddenly has 128 ROPS vs original 64. Could someone investigate this? With a wider memory controller there's alot more bandwidth available now which should kill any negative aspects the previous chip had and needed badly for improving performance (HBM OC'ing).

If they do happend to add 64 more ROPS then the performance benefit should be alot better then what it is now, right? Because the original VEGA seemed to be bottlenecked by both amount of ROPS and memory bandwidth. It's a different chip then which i find a weird move since NAVI is coming out as well. That means they are running at least 3 different GPU's on the assembly line from RX590 to VEGA and NAVI. The 60 CU's seem to be a choice to get best from a single wafer. Perhaps there are bios mods available for a full 64 CU unlock. 

Furthermore the performance seems good; i'm about to slam 2000 euro into a complete new TR2 system so this card is more then welcome.


----------



## fynxer (Jan 11, 2019)

Good luck with RayTracing in software, if that was viable we would have had that already. If they do it it is just a desperate move not to look obsolete.

Do not expect RayTracing in hardware until end of 2020 and even then they will be years behind nVidia who will, by that time, be in the process of readying their third gen RTX cards for release.

We need Intel to enter the market with RayTracing from the get go in 2020.

I also have a feeling that AMD may be working secretly with Intel on RayTracing tech to sett up a unified standard against nvidias RTX.


----------



## Jism (Jan 11, 2019)

Those CU's could be programmed to fill in raytracing. These vega cards are programmable till tokio.


----------



## londiste (Jan 11, 2019)

Camm said:


> Dedicated Hardware acceleration for RT is a smokescreen IMO, the key is if you can cut down your FP or INT instructions as small as possible and run as many as parallel as possible. AMD does have some FP division capability so its possible that some cards can be retrofitted for RT.


You mean essentially RPM or 4*INT8? Vega brought them into consumer space and got some shiny moments in game performance thanks to it. In the other camp Turing followed suit with including RPM and at least on GPU level 4*INT8 was in Pascal if not earlier.



Nkd said:


> Who said async compute was dead. It was primitive shader and dsbr that never worked on vega, not async compute. I think you are confused here. Never ever AMD said async compute was not supported or dead.


Async is alive and kicking. However its impact is fairly small. Following what is in the news post Strange Brigade actually gains a few % from Async Compute being enabled, at best. It is definitely a good thing to have but a game changer. It also works fine enough in both camps by now.


----------



## INSTG8R (Jan 11, 2019)

londiste said:


> You mean essentially RPM or 4*INT8? Vega brought them into consumer space and got some shiny moments in game performance thanks to it. In the other camp Turing followed suit with including RPM and at least on GPU level 4*INT8 was in Pascal if not earlier.


Good point. RPM often gets overlooked I know FC5 is using it and I’m gonna assume AC Odyssey would too.


----------



## londiste (Jan 11, 2019)

Hugh Mungus said:


> Why is Radeon VII only 7.5% faster in hitman 2?


Isn't Hitman 2 fairly CPU-hungry? AMD's game test results are on i7-7700K.



fynxer said:


> I also have a feeling that AMD may be working secretly with Intel on RayTracing tech to sett up a unified standard against nvidias RTX.


Bullshit. DXR in DX12 and Vulkan-RT extensions is as standard as it gets. AMD will do their implementation of these if they know what is good for them (and extend on these if necessary).
Unless you are implying AMD would either try shoving RT into DX11 or into something proprietary?


----------



## Midland Dog (Jan 11, 2019)

Pixrazor said:


> still 64 ROPs damn...
> so just Vega but at 7nm and more hbm2 to boost up the price.
> where is Navi??


apparently its 128


----------



## Kaotik (Jan 11, 2019)

londiste said:


> Isn't Hitman 2 fairly CPU-hungry? AMD's game test results are on i7-7700K.
> 
> Bullshit. DXR in DX12 and Vulkan-RT extensions is as standard as it gets. AMD will do their implementation of these if they know what is good for them (and extend on these if necessary).
> Unless you are implying AMD would either try shoving RT into DX11 or into something proprietary?


Last time I cheked Vulkan doesn't have official RT-extensions at this time, NVIDIA was at least trying to push their solution [essentially RTX] as standard but at least so far that hasn't happened to my knowledge



Midland Dog said:


> apparently its 128


Unless proven otherwise it should be 64, as the Vega 20 diagrams from Instinct release clearly show 4 Pixel Engines per Shader Engine.
I think the 64/128 confusion comes from the fact that NVIDIA cards have their ROPs tied to memory controllers, so doubling memory controllers should double the ROPs and same logic is applied to Vega on some sites, even though in AMDs case the two aren't tied together


----------



## Midland Dog (Jan 11, 2019)

Kaotik said:


> Last time I cheked Vulkan doesn't have official RT-extensions at this time, NVIDIA was at least trying to push their solution [essentially RTX] as standard but at least so far that hasn't happened to my knowledge
> 
> 
> Unless proven otherwise it should be 64, as the Vega 20 diagrams from Instinct release clearly show 4 Pixel Engines per Shader Engine.
> I think the 64/128 confusion comes from the fact that NVIDIA cards have their ROPs tied to memory controllers, so doubling memory controllers should double the ROPs and same logic is applied to Vega on some sites, even though in AMDs case the two aren't tied together


pretty sure GN said 128


----------



## INSTG8R (Jan 11, 2019)

Kaotik said:


> Last time I cheked Vulkan doesn't have official RT-extensions at this time, NVIDIA was at least trying to push their solution [essentially RTX] as standard but at least so far that hasn't happened to my knowledge
> 
> 
> Unless proven otherwise it should be 64, as the Vega 20 diagrams from Instinct release clearly show 4 Pixel Engines per Shader Engine.
> I think the 64/128 confusion comes from the fact that NVIDIA cards have their ROPs tied to memory controllers, so doubling memory controllers should double the ROPs and same logic is applied to Vega on some sites, even though in AMDs case the two aren't tied together


Actually the ROPS are tied to the memory which is why it has 16GB. I can’t provide a direct source for this quote but it rings true 
`Unfortunately, you can't scale down the HBM2 any further and still retain the 128 ROPs, so 16 GB is the smallest capacity AMD can offer, which is why the pricepoint on this is so close relative to the 2080.
`


----------



## Kaotik (Jan 11, 2019)

Midland Dog said:


> pretty sure GN said 128


I'm aware many have said 128, but none of the press material provided by AMD suggests such and Radeon Instint block diagrams say 64, so until AMD itself says 128 or we get benchmarks showing ROP capabilities past 64 units it's more probable option than 128



INSTG8R said:


> Actually the ROPS are tied to the memory which is why it has 16GB. I can’t provide a direct source for this quote but it rings true
> `Unfortunately, you can't scale down the HBM2 any further and still retain the 128 ROPs, so 16 GB is the smallest capacity AMD can offer, which is why the pricepoint on this is so close relative to the 2080.
> `


I'm pretty sure they're not in AMDs case, they're just assuming it because they're tied on NVIDIA and _most_ AMD chips use same ROP:Memory Controller -ratio. Fiji for example has 4096-bit HBM memory controller and 64 ROPs, while it should have 128 if the memory controllers and ROPs were tied together. Also, with Vegas HBCC they're even less connected than before, they're actually behind Infinity Fabric -bus now.

For the 16 GB, it's the smallest capacity you can have with 4096-bit HBM2 because no-one makes smaller than 4GB HBM2-stacks.


----------



## londiste (Jan 11, 2019)

Kaotik said:


> Last time I cheked Vulkan doesn't have official RT-extensions at this time, NVIDIA was at least trying to push their solution [essentially RTX] as standard but at least so far that hasn't happened to my knowledge


You are right, my bad. There are only NV_raytracing extensions for Vulkan that went out of beta. The official answer was that Vulkan allows doing RT already.
I thought Vulkan was supposed to improve on how (badly) OpenGL dealt with extensions


----------



## INSTG8R (Jan 11, 2019)

Kaotik said:


> I'm aware many have said 128, but none of the press material provided by AMD suggests such and Radeon Instint block diagrams say 64, so until AMD itself says 128 or we get benchmarks showing ROP capabilities past 64 units it's more probable option than 128
> 
> 
> I'm pretty sure they're not in AMDs case, they're just assuming it because they're tied on NVIDIA and _most_ AMD chips use same ROP:Memory Controller -ratio. Fiji for example has 4096-bit HBM memory controller and 64 ROPs, while it should have 128 if the memory controllers and ROPs were tied together. Also, with Vegas HBCC they're even less connected than before, they're actually behind Infinity Fabric -bus now.
> ...


We really just need to wait for proper product spec sheets at this point because right now both are being tossed around and nobody seems to have a concrete answer.


----------



## Rahmat Sofyan (Jan 11, 2019)

hats off for Dr.Lisa Su ..

still calm and responded with great answers ..

DXR not ready yet 100%, still plenty of time for RTG and AMD to get ready.

my RX 570 and RX 480 still Okay


----------



## renz496 (Jan 11, 2019)

fynxer said:


> Good luck with RayTracing in software, if that was viable we would have had that already. If they do it it is just a desperate move not to look obsolete.
> 
> Do not expect RayTracing in hardware until end of 2020 and even then they will be years behind nVidia who will, by that time, be in the process of readying their third gen RTX cards for release.
> 
> ...



RTX is just nvidia fancy name for their hardware implementation. just like when they call their tessellation engine as "polymorph engine". the unified standard for ray tracing already exist in DirectX called DXR. RTX is not some exclusive API like mantle where it can only run on certain hardware.  since DXR is the standard in DirectX AMD and Intel will have to follow that standard instead of coming out with new standard.


----------



## Assimilator (Jan 11, 2019)

Probably the most useful thing about this card is that, if it is able to perform anywhere near the RTX 2080, it might well induce NVIDIA to drop the latter's price. Considering how expensive Vega 56/64 were, and remain, I'm pretty sure NVIDIA has a *lot* more wiggle-room in terms of pricing - and they surely would love to shut AMD out from the high-end GPU market completely, because that woudl guarantee them an effective monopoly on that market segment going forward.

tl;dr NVIDIA might well be willing to drop the price on RTX 2080 to allow them to hike the price on RTX 3000 and all its descendants.


----------



## Camm (Jan 11, 2019)

Assimilator said:


> tl;dr NVIDIA might well be willing to drop the price on RTX 2080 to allow them to hike the price on RTX 3000 and all its descendants.



RTX die sizes are huge, I dont think there is much wiggle room at all. Conversely, Vega VII die is much smaller, but HBM is still expensive, so AMD probably doesn't have much room either.

Interesting times.


----------



## Aquinus (Jan 11, 2019)

Assimilator said:


> Probably the most useful thing about this card is that, if it is able to perform anywhere near the RTX 2080, it might well induce NVIDIA to drop the latter's price. Considering how expensive Vega 56/64 were, and remain, I'm pretty sure NVIDIA has a *lot* more wiggle-room in terms of pricing - and they surely would love to shut AMD out from the high-end GPU market completely, because that woudl guarantee them an effective monopoly on that market segment going forward.
> 
> tl;dr NVIDIA might well be willing to drop the price on RTX 2080 to allow them to hike the price on RTX 3000 and all its descendants.


That's predicated on the idea that nVidia's RTX offerings aren't having yield issues which I find hard to believe for the 2080 Ti. As for the 2080, I'm not sure, but it's still a pretty good size die (bigger than a Vega 64.) Honestly, I think nVidia's problem is old inventory. Between that and the less than stellar reception of the RTX chips, investors were not amused.


----------



## Mysteoa (Jan 11, 2019)

Assimilator said:


> Probably the most useful thing about this card is that, if it is able to perform anywhere near the RTX 2080, it might well induce NVIDIA to drop the latter's price.



The price for RTX 2080 is around 750€ where I live and Radeon 7 will probably cost more then 700€ initially.


----------



## btarunr (Jan 11, 2019)

Xzibit said:


> Here is a better look



I still only count 64 ROPs in that graphic, since each "RB" (render backend) crunches 4 pixels per clock.


----------



## Kissamies (Jan 11, 2019)

Weird that there's just 3840 shaders


----------



## Vya Domus (Jan 11, 2019)

That die size is pretty small and also not all of it is enabled, for the first time in many years AMD has a card that likely has better margins that Nvidia's equivalent. That's a pretty big deal.


----------



## rvalencia (Jan 11, 2019)

Apocalypsee said:


> I highly doubt it have 128ROPs. If it did have 64 ROPs then old Vega56/64 are memory bandwidth starved. Then again, all AMD recent GPU are bandwidth starved for example RX470 that uses the same memory as RX480 performs very close to it. GCN is reaching its limits, its good for compute but not as a gaming card. They need to put more than 4 Shader Engines, which in return increase geometry units and number of ROPs.


From https://www.techpowerup.com/gpu-specs/radeon-rx-vega-m-gh.c3056
Recent "RX Vega M GH" has 64 ROPS


----------



## Fluffmeister (Jan 11, 2019)

According to AMD the cost of 7nm is significant, with 16 hbm2 I can't imagine it's cheap for them, but i assume they are least making some money.


----------



## Deleted member 158293 (Jan 11, 2019)

AMD does have pretty much all of the gaming space to consider when developing AMD implementing new features like RT.   Having a half-@$$ noisy hybrid ray tracing implementation with little traction like seen presently wouldn't do them, Sony, or Microsoft any favours, nor impress them IMO.

Hopefully AMD learned this from their Async push, which is still great technology, but the software ecosystem wasn't ready a few years ago.


----------



## FordGT90Concept (Jan 11, 2019)

Fluffmeister said:


> According to AMD the cost of 7nm is significant, with 16 hbm2 I can't imagine it's cheap for them, but i assume they are least making some money.


Which is why Vega 20 isn't bigger than Vega 10.  I think Huang's explosion is because he realizes he made a "big" mistake with Turing.  AMD is focusing on where the money is at, not winning performance crowns that mean little in the larger context of things.  Turing is substantially larger (and more costly to produce) than even Vega 10 is.


On topic, Vega 20 doesn't really impress but it really wasn't intended to impress either. Vega 7nm w/ Fiji memory bandwidth.


----------



## Aquinus (Jan 11, 2019)

FordGT90Concept said:


> Vega 7nm w/ Fiji memory bandwidth.


I don't think that the added bandwidth is going to make a difference on this chip, that was just a side-effect of putting 16GB on it. I've played around with my own Vega 64 a bit to realize that not only does HBM overclock really well, it also makes practically zero difference in terms of performance, even with a 20% overclock on it. Vega was never starved for memory bandwidth and to me, that says this is totally about capacity. If power consumption could be improved, that would go a long way for Vega.


----------



## ssdpro (Jan 11, 2019)

The news of this article is Radeon VII=$699 2080=$699 for on par performance if you ignore ray tracing/tensor. That exact match pricing for on par minus a few features is not AMD's modus operandi. This needs to be $499-549.


----------



## FordGT90Concept (Jan 11, 2019)

It has double the VRAM of the RTX 2080; hence, equal price.  I suspect AMD is making more profit per Radeon VII sold than NVIDIA is per RTX 2080 sold though.  AMD could cut its price if NVIDIA does but NVIDIA won't.


----------



## Kaotik (Jan 11, 2019)

FordGT90Concept said:


> On topic, Vega 20 doesn't really impress but it really wasn't intended to impress either. Vega 7nm w/ Fiji memory bandwidth.


Actually double the Fiji memory bandwidth. Fiji had 512GB/s, this has 1 TB/s. Also the Vega architecture itself has been updated over original Vega to include support for new functions and formats which accelerate many AI-tasks etc, ACEs are supposedly improved too


----------



## cucker tarlson (Jan 11, 2019)

FordGT90Concept said:


> It has double the VRAM of the RTX 2080; hence, equal price.  I suspect AMD is making more profit per Radeon VII sold than NVIDIA is per RTX 2080 sold though.  AMD could cut its price if NVIDIA does but NVIDIA won't.


key word-per sold.


----------



## Gasaraki (Jan 11, 2019)

Assimilator said:


> Probably the most useful thing about this card is that, if it is able to perform anywhere near the RTX 2080, it might well induce NVIDIA to drop the latter's price. Considering how expensive Vega 56/64 were, and remain, I'm pretty sure NVIDIA has a *lot* more wiggle-room in terms of pricing - and they surely would love to shut AMD out from the high-end GPU market completely, because that woudl guarantee them an effective monopoly on that market segment going forward.
> 
> tl;dr NVIDIA might well be willing to drop the price on RTX 2080 to allow them to hike the price on RTX 3000 and all its descendants.



The only way they could have done this is if they priced the Radeon 7 at $649 or $599, not $699. $699 is the same price as the RTX2080 but the 2080 doesn't have the heat, power use, has RT cores, has Tensor cores, etc. Overall the RTX2080 is expensive because it has new tech in it. If I have to pay the same price, I will buy the one with the lower power draw, the lower heat, the advance tech in it.



Fluffmeister said:


> According to AMD the cost of 7nm is significant, with 16 hbm2 I can't imagine it's cheap for them, but i assume they are least making some money.




The rumor is that it costs close to $750 to make the Radeon 7 cards. So no, they are not making money. This is just to stop the bleeding.


----------



## INSTG8R (Jan 11, 2019)

You really got drop the “it has Tensor Cores” they’re just Compute units with a fancy name that Vega have too and for both camps bring little for nothing to gaming save a few unique cases.


----------



## Zubasa (Jan 11, 2019)

Assimilator said:


> Probably the most useful thing about this card is that, if it is able to perform anywhere near the RTX 2080, it might well induce NVIDIA to drop the latter's price. Considering how expensive Vega 56/64 were, and remain, I'm pretty sure NVIDIA has a *lot* more wiggle-room in terms of pricing - and they surely would love to shut AMD out from the high-end GPU market completely, because that woudl guarantee them an effective monopoly on that market segment going forward.
> 
> tl;dr NVIDIA might well be willing to drop the price on RTX 2080 to allow them to hike the price on RTX 3000 and all its descendants.


A more likely senario is we might see the full TU104 in a gaming card.
The RTX 2080 only has 2944 of the 3072 CUDA cores in the full TU104 chip, that is currently in the Quadro RTX 5000.
The RTX 5000 also has the full 384 Tensor cores vs 368, and 48 RT cores instead of 46.


----------



## ShurikN (Jan 11, 2019)

INSTG8R said:


> Good point. RPM often gets overlooked I know FC5 is using it and I’m gonna assume AC Odyssey would too.


Considering they had a special segment showing Division 2 in the Key Note, it's highly likely RPM will find it's way into that game.


----------



## INSTG8R (Jan 11, 2019)

ShurikN said:


> Considering they had a special segment showing Division 2 in the Key Note, it's highly likely RPM will find it's way into that game.


Most likely you’re correct makes perfect sense to leverage the tech whenever they can.


----------



## Imsochobo (Jan 11, 2019)

Gasaraki said:


> The only way they could have done this is if they priced the Radeon 7 at $649 or $599, not $699. $699 is the same price as the RTX2080 but the 2080 doesn't have the heat, power use, has RT cores, has Tensor cores, etc. Overall the RTX2080 is expensive because it has new tech in it. If I have to pay the same price, I will buy the one with the lower power draw, the lower heat, the advance tech in it.
> 
> 
> 
> ...



cut 50$ as it's slightly defective chips.
It's a non profit mindshare stunt in my eyes, even if they have 100$ margin for the entire channel(amd,AIB,etailer).

It's the right decision nevertheless.


----------



## ppn (Jan 11, 2019)

AMD missed great opportunity here. the old VEGA 10 has 15% of the die area outside the main core elements where only the PCIE and memory channels reside. Vega 20 has now 45% of precious 7nm die lost by empty spaces beside the PCIE/mem. this chip could have easily been reduced to 232 mm.sq same as RX 590, and keep the 4096 processors config. Slap 8GB of very fast 616 GB/s HBM2 on there and nobody would have cared that it is only 2048 bit. seriously AMD how could you do this mess.


----------



## Camm (Jan 11, 2019)

ppn said:


> this chip could have easily been reduced to 232 mm.sq same as RX 590, and keep the 4096 processors config. Slap 8GB of very fast 616 GB/s HBM2 on there and nobody would have cared that it is only 2048 bit. seriously AMD how could you do this mess.



Because this is obviously a stop gap card that AMD could engineer for cheap?


----------



## XXL_AI (Jan 11, 2019)

AMD=Engineering
NVIDIA=Science & Arts & Fun & Technology & Engineering
intel=how to get away with murder.


----------



## moproblems99 (Jan 11, 2019)

ssdpro said:


> The news of this article is Radeon VII=$699 2080=$699 for on par performance if you ignore ray tracing/tensor. That exact match pricing for on par minus a few features is not AMD's modus operandi. This needs to be $499-549.



I disagree.  I think $699 is a great price for it since it has 2080 performance which is $699.  Everybody says you don't buy a GPU for what it will do (at least that is what people say about AMDs fine wine approach), you buy it for what it does today.  Today, there is one game that people actually play that has RTX.  That means people are buying it for the performance.  Therefore, $699 is appropriate as that is the going rate.  Don't like pricing?  Ask NV why they started it.


----------



## Kaotik (Jan 11, 2019)

Gasaraki said:


> The rumor is that it costs close to $750 to make the Radeon 7 cards. So no, they are not making money. This is just to stop the bleeding.


There's always rumors, before some analysts with experience on graphics card BOM chip in, those rumors aren't worth the time you took to write that post.
It's ridiculous how much different rumors float around AMD especially on the graphics front and that people actually treat them like they're some sort of gospel because it's on the internet, like all the "Navi is Sony exclusive" crap and so on


----------



## kings (Jan 11, 2019)

moproblems99 said:


> I disagree.  I think $699 is a great price for it since it has 2080 performance which is $699.  Everybody says you don't buy a GPU for what it will do (at least that is what people say about AMDs fine wine approach), you buy it for what it does today.  Today, there is one game that people actually play that has RTX.  That means people are buying it for the performance.  Therefore, $699 is appropriate as that is the going rate.  Don't like pricing?  Ask NV why they started it.



So, $699 for a RTX 2080 is a great price? You can't consider a Radoen VII for $699 a great price, and don't consider the same for the RTX 2080... is pure logic.

The Radeon VII sucks in price/performance, like RTX 2080 sucks. We had this same performance and price 2 years ago, on the 1080Ti.

This only proves that "people's friend AMD", as many people think thay are, does not exist, it's a profit-only company like all the others and has no problem raising prices if they can.


----------



## moproblems99 (Jan 11, 2019)

kings said:


> This only proves that "people's friend AMD", as many people think thay are, does not exist, it's a profit-only company like all the others and has no problem raising prices if they can.



That is why it is a great price.  I have said for a while that people have the gaming industry they deserve.  People have been buying these cards so therefore the price is right.  If NV can do it and people love it, why can't AMD.

I am not delusional enough to think that AMD cares about me for anything else than my money.  I treat them as such.


----------



## ZoneDymo (Jan 11, 2019)

Drop the price 100 bucks and this card should be a no brainer.


----------



## Mr.Mopar392 (Jan 11, 2019)

ZoneDymo said:


> Drop the price 100 bucks and this card should be a no brainer.


Even if they drop it a 100 bucks less, you really think will change the minds of many who shill every second for nvidia lol, their gonna find something else to complain about. Thank god theirs a core fan base for amd products and neutral fans don't show bias to either brand, because if amd depended on nvidiots to convert they would have certainly when bankrupt years ago.


----------



## ZoneDymo (Jan 11, 2019)

Mr.Mopar392 said:


> Even if they drop it a 100 bucks less, you really think will change the minds of many who shill every second for nvidia lol, their gonna find something else to complain about. Thank god theirs a core fan base for amd products and neutral fans don't show bias to either brand, because if amd depended on nvidiots to convert they would have certainly when bankrupt years ago.



Well no I dont think it would win over fanboys, nothing does, thats the concept of a fanboy.
I more mean it indeed for the people who are just looking for a new gpu and dont really care about who made it as long as its good.
They should make it a no brainer whether to buy a Radeon 7 or a RTX2080 and a very competitive price (100 dollars less) would do that.
Then the only question would be "should I shell out even more money for a RTX2080Ti?" and the answer would probably be no.


----------



## Xzibit (Jan 11, 2019)

Fluffmeister said:


> According to AMD the cost of 7nm is significant, with 16 hbm2 I can't imagine it's cheap for them, but i assume they are least making some money.



Now imagine how much a die shrink of TU102 will cost and how much they will charge for it. 550mm2 to 600mm2 minimun, What if they add to it?


----------



## Casecutter (Jan 11, 2019)

Lot of rumors floated  when Vega first came out in regards to part/labor cost being higher than the MSRP.  And some, as to memory might had justification, or defects getting the GPU/HBM2 imposer in full production.  Although the other day Egg had a Sapphire Vega 64 (blower) *No Rebate* just a $5 off code (whoop) and it was $395, that's 20% off MSRP.  While even in early November '18, a PowerColor RED DRAGON Vega 56 was $330 working only a code (-18%).  I'm sure everyone (AMD, channel, retail) is making it wroth their while in moving them along... there not doing it out the goodness of their heart.

So this tells me that this Vega 7 is fine and AMD has the MSRP set that each of these geldings probably make more then the price out of TSCM.  Do we think "Instinct" volume is more mainstream than "Frontier" SKU's into professional, I think that is probably the case. Could we say 25% of total "Instinct" are Binned?  Seeing the price above AMD seems to show strong confidence in interposer production, while HBM2 prices are probably dropped and inventory blanket orders are assured.  I'd bet that (like the RX 590) AIB will just repurpose exist Vega coolers/fan mainly just update shrouds and appearance.

So, if $699 is MSRP just 10% reduction is $630.  If all such assumptions fall as (8 out 10 correct) AMD could fill and maintain the channel better the original Vega which by all appearances was and is lax.


----------



## Fluffmeister (Jan 11, 2019)

Xzibit said:


> Now imagine how much a die shrink of TU102 will cost and how much they will charge for it. 550mm2 to 600mm2 minimun, What if they add to it?



Indeed, despite the large die-size it may actually still be cheaper for them to manufacturer on the refined 12nm node, while still adding features, performance and continuing to improve performance per watt over Pascal.

It's clear they don't need to chase the gains 7nm offers yet, unlike AMD.

Edit: -1! The truth hurts eh Casecutter?


----------



## efikkan (Jan 11, 2019)

moproblems99 said:


> I disagree.  I think $699 is a great price for it since it has 2080 performance which is $699.  Everybody says you don't buy a GPU for what it will do (at least that is what people say about AMDs fine wine approach), you buy it for what it does today.  Today, there is one game that people actually play that has RTX.  That means people are buying it for the performance.  Therefore, $699 is appropriate as that is the going rate.  Don't like pricing?  Ask NV why they started it.


Assuming the performance will be comparable, why would you still buy it when it has major drawbacks? What real advantages does it offer over the RTX 2080, justifying its existence?


----------



## moproblems99 (Jan 11, 2019)

efikkan said:


> Assuming the performance will be comparable, why would you still buy it when it has major drawbacks? What real advantages does it offer over the RTX 2080, justifying its existence?



What are the drawbacks?  What advantages does the 2080 have?  You can't be talking about RTX and DLSS, can you?


----------



## efikkan (Jan 11, 2019)

moproblems99 said:


> What are the drawbacks?  What advantages does the 2080 have?  You can't be talking about RTX and DLSS, can you?


Primarily a major difference in TDP: 215W vs. ~300W.

When you have competing products A and B, which performs and costs the same, but one of them have a major disadvantage, why would anyone ever buy it?


----------



## moproblems99 (Jan 11, 2019)

I don't consider that a major disadvantage.  It's probably less than $20 a year.  If that is the only disadvantage then I don't see a problem.  Also, throw that 215W out after you start overclocking and lower that 300W when you undervolt.


----------



## Totally (Jan 11, 2019)

Fluffmeister said:


> According to AMD the cost of 7nm is significant, with 16 hbm2 I can't imagine it's cheap for them, but i assume they are least making some money.



All that slide does is say is the die says remian the same costs go up, so when does a die not get smaller when going from a larger process to a smaller one? With this in mind it shows that they they were are profitting with decreasing margins until jump to 7nm.


----------



## lexluthermiester (Jan 12, 2019)

Late to the party again, but I'd say this is a decent answer to RTX. Maybe not the show stopper that Ryzen was but damn decent none-the-less. It seems AMD has kicked it up.



fynxer said:


> Good luck with RayTracing in software, if that was viable we would have had that already. If they do it it is just a desperate move not to look obsolete.


Raytracing has been done in software for decades, just not real-time.


fynxer said:


> Do not expect RayTracing in hardware until end of 2020 and even then they will be years behind nVidia who will, by that time, be in the process of readying their third gen RTX cards for release.


You don't and can't know any of that.


----------



## steen (Jan 12, 2019)

[QUOTE="Kaotik, post: 3974472, member: 101367"Unless proven otherwise it should be 64, as the Vega 20 diagrams from Instinct release clearly show 4 Pixel Engines per Shader Engine.[/QUOTE]

I must admit I took the 128 ROPs report as given. If the Instinct diags aren't just high level basic copies of Vega10 slides, then definately 64 ROPS for Vega20.


----------



## btarunr (Jan 12, 2019)

AMD has confirmed that the card's ROP count is 64.


----------



## Nkd (Jan 12, 2019)

fynxer said:


> Good luck with RayTracing in software, if that was viable we would have had that already. If they do it it is just a desperate move not to look obsolete.
> 
> Do not expect RayTracing in hardware until end of 2020 and even then they will be years behind nVidia who will, by that time, be in the process of readying their third gen RTX cards for release.
> 
> ...



3rd gen rtx card? Not happening lol. NVidia is not going to refresh until 2020. Thats when they will have 7nm. You really think Nvidia is goint to replace rtx 20 series after less then 12 months? They don't have a process to shrink to and they are not in a hurry to do it. Heck they stretched pascal for 2 years. So Nvidia is going to have 3 rtx generations in 3 years lol. Do you realize what you are saying?



lexluthermiester said:


> Late to the party again, but I'd say this is a decent answer to RTX. Maybe not the show stopper that Ryzen was but damn decent none-the-less. It seems AMD has kicked it up.
> 
> 
> Raytracing has been done in software for decades, just not real-time.
> ...




yea he thinks nvidia is going to release 3 rtx generations in 3 years 2018, 2019 and then 2020. When pascal went for 2 years alone. Not sure about that rofl.


----------



## zo0lykas (Jan 12, 2019)

Gasaraki said:


> The only way they could have done this is if they priced the Radeon 7 at $649 or $599, not $699. $699 is the same price as the RTX2080 but the 2080 doesn't have the heat, power use, has RT cores, has Tensor cores, etc. Overall the RTX2080 is expensive because it has new tech in it. If I have to pay the same price, I will buy the one with the lower power draw, the lower heat, the advance tech in it.
> 
> 
> 
> ...


----------



## ssdpro (Jan 12, 2019)

moproblems99 said:


> What are the drawbacks?  What advantages does the 2080 have?  You can't be talking about RTX and DLSS, can you?


The drawback is the rumored price of $699 and missing technology. If you can get the technology with the other product at the same price why settle? It is like choosing between two identical cars - one has headlights and one doesn't. The salesman can say "hey it is light out right now maybe you won't need those headlights". AMD's engineering has always been adequate but it sold by undercutting competition pricing. If AMD GPU prices intend to match the competition I can't see how they continue to improve their already dismal market shares. NVIDIA's release and pricing led to a major crash in their sales and stock value - I am not sure why a strengthening AMD would want to embrace that model. AMD has a long way to go before they can price with the big boys.


----------



## razaron (Jan 12, 2019)

Camm said:


> It should be noted that Nvidia has a huge ass achilles heel with the RTX series - that RT operations are INT based, and that *the card needs to flush to switch between FP and INT operations.*
> 
> Dedicated Hardware acceleration for RT is a smokescreen IMO, the key is if you can cut down your FP or INT instructions as small as possible and run as many as parallel as possible. AMD does have some FP division capability so its possible that some cards can be retrofitted for RT.


Source? I tried googling it and couldn't find anything.


----------



## Assimilator (Jan 12, 2019)

I just remembered how the AMD fanboys were pissing over RTX 2080 's $699 pricetag at launch, but Radeon VII comes along with the same price and suddenly people are claiming it's great value.

No, great value would be if it wasn't just a Vega respin with double the memory bandwidth, double the VRAM, 250MHz higher clocks, and an extra $200 tacked on to the price. The die-shrink to 7nm is going to help with power and heat, but this is still Vega/GCN 5 with all its limitations, and I honestly don't expect this card to outperform GTX 2080 in the way AMD is claiming.


----------



## efikkan (Jan 12, 2019)

For reference, Vega 20 would need about ~40% more performance over Vega 10 to be on par with RTX 2080. I do wonder which changes are going to make that possible.


----------



## Zubasa (Jan 12, 2019)

Assimilator said:


> I just remembered how the AMD fanboys were pissing over RTX 2080 's $699 pricetag at launch, but Radeon VII comes along with the same price and suddenly people are claiming it's great value.
> 
> No, great value would be if it wasn't just a Vega respin with double the memory bandwidth, double the VRAM, 250MHz higher clocks, and an extra $200 tacked on to the price. The die-shrink to 7nm is going to help with power and heat, but this is still Vega/GCN 5 with all its limitations, and I honestly don't expect this card to outperform GTX 2080 in the way AMD is claiming.


Both are bad value, one being worse than the other doesn't mean either card are good value.


----------



## Nkd (Jan 12, 2019)

efikkan said:


> Primarily a major difference in TDP: 215W vs. ~300W.
> 
> When you have competing products A and B, which performs and costs the same, but one of them have a major disadvantage, why would anyone ever buy it?



Gtx 2080 is around 225w. It remains to be seen what the actual usage is on Radeon 7 during gaming. For that we wait for reviews.



Gasaraki said:


> The only way they could have done this is if they priced the Radeon 7 at $649 or $599, not $699. $699 is the same price as the RTX2080 but the 2080 doesn't have the heat, power use, has RT cores, has Tensor cores, etc. Overall the RTX2080 is expensive because it has new tech in it. If I have to pay the same price, I will buy the one with the lower power draw, the lower heat, the advance tech in it.
> 
> The rumor is that it costs close to $750 to make the Radeon 7 cards. So no, they are not making money. This is just to stop the bleeding.



I don't think that was how much it costs them to make, it was what they originally wanted to sell it at. Yea I have no doubt they are not making much on it.

Plus lets hold off on that heat portion. Wait for the reviews, you can't complain about heat when you haven't seen the temps yet. Will it use more power? Yea sure doesn't mean its going to run hot.


----------



## M2B (Jan 12, 2019)

efikkan said:


> For reference, Vega 20 would need about ~40% more performance over Vega 10 to be on par with RTX 2080. I do wonder which changes are going to make that possible.












This video shows the performance of a Vega 64 clocked at 1,750MHz against an RTX 2080 running at stock clocks. (Also don't forget Vega 64 has 4 more CUs than Radeon VII which makes up for that 50MHz core clock deficit)
Even the memory on the AMD side is overclocked and at those clocks the vega has 580GB of memory bandwidth which is quite a lot.
This is pretty much what you would expect from a Radeon VII to do, maybe a little bit better.


----------



## efikkan (Jan 12, 2019)

Nkd said:


> Gtx 2080 is around 225w. It remains to be seen what the actual usage is on Radeon 7 during gaming. For that we wait for reviews.





Spoiler










AMD promises "25% more performance at the same power", whatever that means.
25% is not enough to be on par with RTX 2080.

But as you say, reviews will tell the truth.


----------



## moproblems99 (Jan 12, 2019)

ssdpro said:


> If you can get the technology with the other product



I fail to see the missing technology.  RTX is usable in one game...and the series is trash.  DLSS looks like shit compared to the other available methods.  I fail to see what benefits the 2080 has.


----------



## FordGT90Concept (Jan 12, 2019)

efikkan said:


> Spoiler
> 
> 
> 
> ...


If you take it at face value, Radeon VII has 25% more performance for the same power consumption (295w).
13% of that performance comes from the higher boost clock of 1800 MHz (remember, 4 CU short).
12% likely comes from Radeon VII's ability to hold boost clock longer than Vega 64 does.

You know how it goes: they're likely talking about games where Vega 64 does really well against Turing.  I highly doubt they're talking about an average.


----------



## Wavetrex (Jan 12, 2019)

I wonder why AMD is stuck with maximum 4096 SP's ?
I mean.... Fury, Vega (1), Vega II ... they are almost identical.

Considering that the new chip is rather small at 331 mm2, what stopped them from making a 450 mm2 chip for example and fitting 72 CU's in it, or 96 !!
It would wipe the floor with 2080 Ti with 6144 SP's (let's say cut a few for being defective, even with 5760 SP's it would still crush it with raw computer power and that massive 1TBps bandwidth, WHILE BEING A SMALLER CHIP due to 7nm)

Instead, they just shrunk Fury, then shrunk it again without adding anything


----------



## FordGT90Concept (Jan 12, 2019)

Because bigger = lower yields.  AMD is all about mass production these days.


----------



## Totally (Jan 13, 2019)

Assimilator said:


> I just remembered how the AMD fanboys were pissing over RTX 2080 's $699 pricetag at launch, but Radeon VII comes along with the same price and suddenly people are claiming it's great value.
> 
> No, great value would be if it wasn't just a Vega respin with double the memory bandwidth, double the VRAM, 250MHz higher clocks, and an extra $200 tacked on to the price. The die-shrink to 7nm is going to help with power and heat, but this is still Vega/GCN 5 with all its limitations, and I honestliy don't expect this card to outperform GTX 2080 in the way AMD is claiming.



I see people justifying the power consumption but I don't see that, only ONE comment stating _"...because the 2080 is $699." _was it's 10-series counter also $699 at launch?



efikkan said:


> Spoiler
> 
> 
> 
> ...



Good spot, but it probably means what it says to does, it's 25% more effiecient. Assuming it's being compared to the V64, when consuming the same amount of power it does 25% more work. We could probably figure out how much power this card really sucks down with that bit assuming power/perf scales linearly and a little guestimation(2080 power * [V64/2080] ratio * [v7/v64] ratio) puts the card around 400-450w.


----------



## Apocalypsee (Jan 13, 2019)

Wavetrex said:


> I wonder why AMD is stuck with maximum 4096 SP's ?
> I mean.... Fury, Vega (1), Vega II ... they are almost identical.
> 
> Considering that the new chip is rather small at 331 mm2, what stopped them from making a 450 mm2 chip for example and fitting 72 CU's in it, or 96 !!
> ...


They say have removed the 4 Shader Engine limitation on GCN5 (Vega), but I dont believe that. They instead put something to mitigate the limitation like DSBR, NGG fastpath, HBCC, which some of them are broken. AMD should dump GCN for gaming card and start anew.

Even if I dont have my Vega56 I wont buy this card at all, for once it still uses the same limitation since Fiji. They only increase clockspeed and add tiny bit of improvement here and there. Only reason I bought my Vega56 is because it didnt have the dreaded 4GB limitation as Fury so new games wont choke, and I get it for cheap since mining crash.


----------



## Manoa (Jan 13, 2019)

Wavetrex said:


> I wonder why AMD is stuck with maximum 4096 SP's ?
> I mean.... Fury, Vega (1), Vega II ... they are almost identical.
> 
> Considering that the new chip is rather small at 331 mm2, what stopped them from making a 450 mm2 chip for example and fitting 72 CU's in it, or 96 !!
> ...



so...the rules of interactive entertainment is ?
think littel ? - stay littel
think big - get BIG

isn't that whay nvidea have money for drivers and AMD don't ? the money they get from thinking big give them drivers, when AMD fail again and again in drivers and still didn't learned that game developer relations important ? let me guess: the "solution" to developer relations is put more GB/s memory banwith and another 1000 mhz. they never learn. do wrong once, you stupid, do wrong twice you retard, do wrong 3: you insane.


----------



## moproblems99 (Jan 13, 2019)

Manoa said:


> do wrong once, you stupid, do wrong twice you retard, do wrong 3: you insane.



What does complaining about driver issues that don't exist make you?


----------



## Manoa (Jan 13, 2019)

whare do you see complaining ? and whare do you see don't exist ?

AMD shills ? I can respect that, you look like a fighter too. "fight for your right for gaming on AMD, kill anyone that looks like against AMD" ?
but you could use a brain: a gaming developer relationship program will benefit AMD more than a few more mhz and a few more GB/s memory banwith
you don't see the advantage of that ? for your own good ? what does that make you ?
how about async compute enabled on all games sounds to you ? should I mention how much faster doom 4 was with async enabled ? and that was just one game where is was used.....WITHOUT AMD's help...and that's just the beginning, are you able to imagine what it could mean if AMD was involved ? in all games ?
oh wait, you are a radeon expert, im sorry you must know more than I do


----------



## efikkan (Jan 13, 2019)

Wavetrex said:


> I wonder why AMD is stuck with maximum 4096 SP's ?
> I mean.... Fury, Vega (1), Vega II ... they are almost identical.
> 
> Considering that the new chip is rather small at 331 mm2, what stopped them from making a 450 mm2 chip for example and fitting 72 CU's in it, or 96 !!
> ...


*FordGT90Concept* said it's yields, and that's part of it, but the biggest reason is probably resource management. If AMD were to make a GPU with 50% more cores, it would need at least 50% scheduling resources. Resource management is already the main reason why GCN is inefficient compared to Nvidia, and the reason why RTX 2060 (1920 cores) manages to match Vega 64 (4096 cores). As we all know, AMD have plenty of theoretical performance that they simply can't utilize properly. Adding 50% more cores would require rebalancing of the entire design, otherwise they would risk getting even lower efficiency. Vega 20 is just a tweaked design with some professional features added.


----------



## Manoa (Jan 13, 2019)

it also whay async help it significantly ?


----------



## moproblems99 (Jan 13, 2019)

Manoa said:


> whare do you see complaining ? and whare do you see don't exist ?
> 
> AMD shills ? I can respect that, you look like a fighter too. "fight for your right for gaming on AMD, kill anyone that looks like against AMD" ?
> but you could use a brain: a gaming developer relationship program will benefit AMD more than a few more mhz and a few more GB/s memory banwith
> ...



U mad bro?

Actually, I would prefer no developer relationships with either company because they are only good for one 'color'.  Clearly, one of us needs a brain...


----------



## FordGT90Concept (Jan 13, 2019)

Manoa said:


> it also whay async help it significantly ?


Yes, GCN is an async monster because of under utilization of its hardware resources.


----------



## londiste (Jan 13, 2019)

FordGT90Concept said:


> Manoa said:
> 
> 
> > it also whay async help it significantly ?
> ...


Is it true any more? Latest testing on async seems to show both camps benefiting from it, AMD cards more than Nvidia but the difference is a couple percent at most.


----------



## ssdpro (Jan 13, 2019)

moproblems99 said:


> I fail to see the missing technology.  RTX is usable in one game...and the series is trash.  DLSS looks like shit compared to the other available methods.  I fail to see what benefits the 2080 has.


Missing tech is missing tech. It isn't in one game, it is in many more and many more coming. Real time ray tracing is in these games:

Assetto Corsa Competizione
Atomic Heart
Battlefield V
Control
Enlisted
Justice
JX3
MechWarrior 5: Mercenaries
Metro Exodus
ProjectDH
Shadow of the Tomb Raider
As for DLSS, the list is longer:

Ark: Survival Evolved
Anthem
Atomic Heart
Battlefield V
Dauntless
Final Fantasy 15
Fractured Lands
Hitman 2
Islands of Nyne
Justice
JX3
Mechwarrior 5: Mercenaries
PlayerUnknown’s Battlegrounds
Remnant: From the Ashes
Serious Sam 4: Planet Badass
Shadow of the Tomb Raider
The Forge Arena
We Happy Few
Darksiders III
Deliver Us The Moon: Fortuna
Fear the Wolves
Hellblade: Senua’s Sacrifice
KINETIK
Outpost Zero
Overkill’s The Walking Dead
SCUM
Stormdivers
Again, the Radeon VII price is just too high. You'll notice I never criticize the product - it uses great memory and plenty of it. I like that feature. What I don't like is the high price and missing features. At a proper market price of 499-549 it is a winner. To end with a tip, users appear more credible if you post your comments without profanity and with support for facts.

Data sources: https://www.digitaltrends.com/computing/games-support-nvidia-ray-tracing/ , https://www.kitguru.net/components/...rt-nvidias-ray-tracing-and-dlss-rtx-features/


----------



## moproblems99 (Jan 13, 2019)

Phew.  Exactly 0 of those games are on my play list so not a huge loss.  That said, thanks for throwing the lists up there.  Now, what do we do about RTRT making everything look a mirror? or DLSS looking like a jaggy mess?  I will say that I did go back and look at the comparisons and DLSS doesn't look as bad as I originally thought.  I still wouldn't call either of these pieces of technology that are earth shattering.  Down the line? Probably.  Right now?  Nothing special.


----------



## rvalencia (Jan 13, 2019)

FordGT90Concept said:


> Which is why Vega 20 isn't bigger than Vega 10.  I think Huang's explosion is because he realizes he made a "big" mistake with Turing.  AMD is focusing on where the money is at, not winning performance crowns that mean little in the larger context of things.  Turing is substantially larger (and more costly to produce) than even Vega 10 is.
> 
> 
> On topic, Vega 20 doesn't really impress but it really wasn't intended to impress either. Vega 7nm w/ Fiji memory bandwidth.


VII is still Vega with the same 64 ROPS bottleneck.  AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.



ssdpro said:


> Missing tech is missing tech. It isn't in one game, it is in many more and many more coming. Real time ray tracing is in these games:
> 
> Assetto Corsa Competizione
> Atomic Heart
> ...


DLSS..promoting pixel reconstruction technique is like Sony's PS4 Pro's pixel reconstruction marketing. VII has higher memory bandwidth for MSAA.



FordGT90Concept said:


> Yes, GCN is an async monster because of under utilization of its hardware resources.


It's more ROPS read-write bottleneck with workaround fix like async compute/TMU read-write path.


----------



## M2B (Jan 13, 2019)

rvalencia said:


> VII is still Vega with the same 64 ROPS bottleneck.  AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.
> 
> 
> DLSS..promoting pixel reconstruction technique is like Sony's PS4 Pro's pixel reconstruction marketing. VII has higher memory bandwidth for MSAA.
> ...



You have no idea what you are talking about.
It has nothing to do with ROPs, having 64 ROPs is completely fine at up to 3840*2160p and won't cause any bottleneck.
Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?


----------



## 95Viper (Jan 13, 2019)

Stop the retaliatory comments, bickering, baiting, and insulting.  You know who you are.
Keep the discussions civil and on topic.

Thank You


----------



## FordGT90Concept (Jan 13, 2019)

rvalencia said:


> VII is still Vega with the same 64 ROPS bottleneck.  AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.


AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels  processed per second.  4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps.  ROPs aren't a problem.


GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do.  This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.

I suspect Navi will take lessons learned from Xbox One and PlayStation 4 to produce hardware that needs minimal driver interference.


----------



## efikkan (Jan 13, 2019)

M2B said:


> It has nothing to do with ROPs, having 64 ROPs is completely fine at up to 3840*2160p and won't cause any bottleneck.
> Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?


You're correct. If ROPs were a major bottleneck, AMD would have solved that by now and unleashed a massive performance gain, but they're not.
That doesn't mean you can't find an edge-case where more ROPs don't help, as with anything else, but that's beside the point.



FordGT90Concept said:


> GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do.  This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.


Well, it's true that Direct3D 12 and Vulkan can offload a lot of the management done by the drivers, but not the _truly low-level_ allocation and resource management, which happens inside the GPU, and this is where GCN struggles.


----------



## M2B (Jan 13, 2019)

FordGT90Concept said:


> GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do. This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.



Driver overhead is only a small part of why GCN is behind Maxwell/Pascal/Turing.
Just like intel's advantage over AMD in gaming, my guess is that it has something to do with way lesser cache latencies on Nvidia GPUs.
Of course this isn't the full story.


----------



## Mescalamba (Jan 13, 2019)

Think it would be maybe viable to have raytracing add-on card. Maybe something in SLi/CrossFire style. One regular, one for raytracing.


----------



## moproblems99 (Jan 13, 2019)

M2B said:


> Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?



I mean in everyone else's defense, they aren't doing anything to change it so they were dumb enough to have a bottleneck in a place that isn't easily fixed.


----------



## londiste (Jan 13, 2019)

moproblems99 said:


> Now, what do we do about RTRT making everything look a mirror?


Of course the demos and showcases for new features will over-emphasize the feature but once it makes it into actual games, it is generally much more toned down. Have you actually played BF5 with DXR?
Shadow of the Tomb Raider patch (shadows) and Metro Exodus (AO) should be a lot more interesting RTRT use cases once they show up.


----------



## Wavetrex (Jan 13, 2019)

FordGT90Concept said:


> AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels  processed per second.  4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps.  ROPs aren't a problem.


Those need to do both texture element reading and buffer output writing.

It's not just about the final frame, because they aren't drawing the same uniform color 55 thousand times per second.

Every texel that needs to be placed somewhere on a scene also needs to be accessed. The ratio of texels read to pixels written might vary widely, but considering that modern games have millions of triangles on the screen, that's a whole lot of textures that need to be read and their data calculated in order to obtain the final pixel.
Obviously there's CU caching involved as well but the ratio is still huge.

Let's assume some numbers: If Texel to Pixel ratio is 20:1, and due to overdrawing more pixels are written than actually seen the display ratio is "only" 5:1 (might be much higher in complex scenes with a lot of translucency, edge anti-aliasing and more), that is already an 100:1 vs your number.

55 thousand fps is now just 550.

Add multi-sampling (4X) and now back to a more realistic 550 : 4 = 137 fps, which seems to be what these cards can do...


----------



## efikkan (Jan 13, 2019)

The myth of the ROP bottleneck for Vega must come to an end.
Just compare GTX 1080, RTX 2060 vs. Vega 64, three GPUs which perform similarly. While GTX 1080 had 102.8 - 110.9 GP/s, RTX 2060 reduced it to 65.52-80.54 GP/s (less than Vega 64's 79.8 - 98.9 GP/s), and still managed to maintain performance, even at 4K.
The issue with GCN is not the raw throughput of the various resources, it's the management of those resources.


----------



## moproblems99 (Jan 13, 2019)

londiste said:


> Of course the demos and showcases for new features will over-emphasize the feature but once it makes it into actual games, it is generally much more toned down. Have you actually played BF5 with DXR?
> Shadow of the Tomb Raider patch (shadows) and Metro Exodus (AO) should be a lot more interesting RTRT use cases once they show up.



I'll never play BF5 but Exodus is a maybe.  All I have seen is it looks awful in the demos and videos so what is there to get excited about?  I have nothing against the 20 series.  I just don't understand what all the fuss is about.  Are these the future? Probably, but it is not going to be until consoles take them on so there is no point to get all hot and bothered about it now.  I hope NV get up to 1.21 jigga rays some day but until then, color me unimpressed.  I am more impressed with extra performance they squeezed out with this gen.  I didn't realize they could beat a horse that well.


----------



## rvalencia (Jan 13, 2019)

FordGT90Concept said:


> AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels  processed per second.  4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps.  ROPs aren't a problem.
> 
> 
> GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do.  This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.
> ...


Wrong.








Vega 56 at 1710 Mhz with *12 TFLOPS* beating Strix Vega 64 at 1590mhz with *13 TFLOPS*. This shows higher clock speed improves raster hardware which enable TFLOPS to be exposed. TFLOPS is useless without ROPS read-write factors.

For absolute performance, lowering time to completion in all graphics pipeline stages should be a priority. Your argument doesn't lead to lowest latency operation.

Where did you obtain Vega ROPS unit has 4 color pixels per clock? Each RB unit has four color ROPS and 16 z-ROPS. For 64 ROPS, that's 16 RB units x 4 color ROPS = 64 color ROPS, hence it's 64 color pixels per clock.

IF AMD is pushing for compute/256 TMUs read-write path, why not increase ROPS count to match TMU read-write performance?



efikkan said:


> The myth of the ROP bottleneck for Vega must come to an end.
> Just compare GTX 1080, RTX 2060 vs. Vega 64, three GPUs which perform similarly. While GTX 1080 had 102.8 - 110.9 GP/s, RTX 2060 reduced it to 65.52-80.54 GP/s (less than Vega 64's 79.8 - 98.9 GP/s), and still managed to maintain performance, even at 4K.
> The issue with GCN is not the raw throughput of the various resources, it's the management of those resources.


RTX 2060 (30 SM) has 48 ROPS at 1900 Mhz stealth overclock. Full  TU106 (36 SM) has *4 MB L2 cache*. Full GP104 has *2MB L2 cache*. This is important for NVIDIA's immediate mode tile cache render loop and improves to lower latency which improves reduces time to completion timings.


----------



## efikkan (Jan 13, 2019)

rvalencia said:


> Vega 56 at 1710 Mhz with 12 TFLOPS beating Strix Vega 64 at 1590mhz with 13 TFLOPS. This shows higher clock speed improves raster hardware which enable TFLOPS to be exposed. TFLOPS is useless without ROPS read-write factors.


This proves nothing in terms of claiming ROPs are the bottleneck. When using a die with fewer cores and compensating with higher clocks there are a lot of other things than just ROP per GFLOP that changes. Slightly cut down chips may have a different resource balance, both on the scheduling, but also cache and register files. All of these are impacting performance long before ROPs even come into play.

We see the same thing with GTX 970, which has higher performance per clock than it's big brother GTX 980. Why? Because it struck a sweetspot in various resources.


----------



## rvalencia (Jan 13, 2019)

efikkan said:


> You're correct. If ROPs were a major bottleneck, AMD would have solved that by now and unleashed a massive performance gain, but they're not.
> That doesn't mean you can't find an edge-case where more ROPs don't help, as with anything else, but that's beside the point.
> 
> 
> Well, it's true that Direct3D 12 and Vulkan can offload a lot of the management done by the drivers, but not the _truly low-level_ allocation and resource management, which happens inside the GPU, and this is where GCN struggles.


AMD is pushing compute shader -TMU read-write path.

Refer to Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.



efikkan said:


> This proves nothing in terms of claiming ROPs are the bottleneck. When using a die with fewer cores and compensating with higher clocks there are a lot of other things than just ROP per GFLOP that changes. Slightly cut down chips may have a different resource balance, both on the scheduling, but also cache and register files. All of these are impacting performance long before ROPs even come into play.
> 
> We see the same thing with GTX 970, which has higher performance per clock than it's big brother GTX 980. Why? Because it struck a sweetspot in various resources.


For AMD GCN, register files are with CU level. Each CU has it's own warp scheduling.

GTX 970 has 56 ROPS with less L2 cache.
GTX 980 has 64 ROPS

Vega 56 has full 64 ROPS and full 4MB L2 cache  like Vega 64. VII's 60 CU still has full L2 cache and 64 ROPS access like MI60. Faster clock speed improves L2 cache and 64 ROPS i.e. lessen the time to completion.


----------



## FordGT90Concept (Jan 14, 2019)

https://hothardware.com/news/amd-radeon-rx-vega-56-unlocked-vega-64-bios-flash

Vega 56  running Vega 64 BIOS is 2% slower than Vega 64.  That likely has less to do ROPs and more to do with underutilization of shaders as I previous said.  Vega, and Fiji before it, were designed for server farms running compute loads.  They were never ideal for gaming.  Polaris, on the other hand, is biased towards gaming.  It has 32 ROPs to 36 CUs (8:9 compared to Vega 64  1:1 or Vega 56 8:7) .  Again, if ROPs were really the bottleneck, AMD would have put more ROPs on it but they didn't.

Vega and Fiji do exceptionally well in async games like Ashes of the Singularity because all of those shaders aren't so underutilized.


----------



## rvalencia (Jan 14, 2019)

FordGT90Concept said:


> https://hothardware.com/news/amd-radeon-rx-vega-56-unlocked-vega-64-bios-flash
> 
> Vega 56  running Vega 64 BIOS is 2% slower than Vega 64.  That likely has less to do ROPs and more to do with underutilization of shaders as I previous said.  Vega, and Fiji before it, were designed for server farms running compute loads.  They were never ideal for gaming.  Polaris, on the other hand, is biased towards gaming.  It has 32 ROPs to 36 CUs (8:9 compared to Vega 64  1:1 or Vega 56 8:7) .  Again, if ROPs were really the bottleneck, AMD would have put more ROPs on it but they didn't.
> 
> Vega and Fiji do exceptionally well in async games like Ashes of the Singularity because all of those shaders aren't so underutilized.


Async *compute *and Sync *compute shaders *has TMU read-write path software optimizations.
Again, read Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.









Btw: Vega M GH has 24 CU with 64 ROPS ratio.

For current Vega 64 type GPU, it's better to trade less CU count which reduce power consumption for higher clock speed which has higher ROPS/L2 cache/*rasterization* performance. Vega 56 at 1710 Mhz with 12 TFLOPS beating Strix Vega 64 OC at 1590Mhz with 13 TFLOPS.

*Rasterization* = floating point geometry to pixel integer mass conversion hardware.








Note the four *Rasterzier *hardware units, AMD increased this hardware during R9-290X's introduction before "Mr TFLOPS" joined AMD in 2013.

At higher clock speed, classic GPU hardware such as  Rasterzier, Render Back End(ROPS) and L2 cache has higher performance i.e. lower time to completion.

*At the same clock speed and L2 cache size, 88 ROPS has 88 pixels per clock has lower time completion when compared to 64 ROPS with 64 pixels per clock. AMD knows ROPS bound problem hence compute shader's read-write path workaround marketing push and VII's 1800Mhz clock speed with lower CU count.*

* NVIDIA's GPU designs has higher clock speeds to speed up classic GPU hardware.*


Cryptocurrency uses TMU read-write path instead of ROPS read-write path.

AMD could have configured a GPU with 48 CU, 64 ROPS and 1900 Mhz

RX-580... AMD hasn't mastered 64 ROPS over 256 bit bus design, hence RX-580 is stuck at 32 ROPS with 256 bit bus. R9-290X has 64 ROPS with 512 bit bus which is 2X scale over R9-380X/RX-580's design.


----------



## medi01 (Jan 14, 2019)

WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.

How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
Just curious.


----------



## Assimilator (Jan 14, 2019)

medi01 said:


> WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.
> 
> How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
> Just curious.



I'm not even sure what you're trying to say here, although your usual crying about NVIDIA bias is obvious.


----------



## rvalencia (Jan 14, 2019)

medi01 said:


> WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.
> 
> How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
> Just curious.


A reminder for AMD, build a GPU with large classic GPU hardware NOT medium size  GP104 class classic GPU hardware with 13 TFLOPS DSP.


----------



## Manoa (Jan 14, 2019)

I suspected for some time that augmenting polaris was better than using vega for games, look verry "simple" for AMD: just double everything on polaris and you got a steallar gaming card


----------



## medi01 (Jan 14, 2019)

Assimilator said:


> ...crying about NVIDIA...


Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.


----------



## londiste (Jan 14, 2019)

medi01 said:


> Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.


Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.


----------



## medi01 (Jan 14, 2019)

londiste said:


> Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.


Except we have seen newer games use more RAM (and with consoles beefed up and pushing 4 it will be a given), and god knows if RT will be anything, but "nvidia paid us to implement it, so here is that gimmick for ya" until RT could be run by the masses and as for DLSS, its usefulness is arguable at best.


----------



## londiste (Jan 14, 2019)

VRAM is not that straightforward either. More is always good but actual usefulness is not that clear. New consoles are next year (2020) at best and current generation sits at 8GB RAM total (XBox One X is an outlier with 12 but that will not change things much). Given the GTX1080Ti/RTX2080 performance if Radeon 7 will perform in the same level it will not really be a 4K card.


----------



## Manoa (Jan 14, 2019)

yhe DLSS suckx, RT mutch better than DLSS, I don't mind RT. I see this limited RT as another rasterization hack: there is no way to properly reflections in rasterization so this was the only way, this is whay it used for reflections in battelfield 5 (I think). but it show that even this littel usage is so insane heavy that crawl everything. if you think about the 2000 cards you see verry littel innovation for improved graphics: selective shaders are for reducing graphics. RT is the only thing that is innovation for increasing graphics. some people alredy mentioned that nvidea reach limit in term of rasterization, and performance shows that the rasterization cores give verry littel improvement compared to pascal, if you think about the increasing silicon problems and costs, engineer (mutch) better rasterization cores would be expencive, it seem to me that both nvidea and AMD did the same thing: they saved re-engineering costs this round and I think I know whay - it not worth with this round of silicon, I think some bigger improvements in silicon like used to be in like 2007 with 65nm to 45 nm re-engineering was worth becouse it gived so mutch, with modern verry littel improvements it not


----------



## londiste (Jan 14, 2019)

@Manoa , what do you mean by innovation? RT and Tensor stuff is new and unused but the shaders themselves are based on Volta, not Pascal and come with couple of other new things as well. The features they come with are actually often enough very close to what Vega brought to the table. RPM seems to be the big one here. Turing has RPM same as Vega - which accounts for several benchmark wins Vega had over Pascal and no longer has over Turing. To bolster cache and memory system Turing actually got caches increased even compared to Volta. Mesh shaders are suspected to have some commonalities with primitive shaders, at least in the idea if maybe not implementation.

This time around Radeon VII is the one with less innovation. Even less so if we look at gaming. Not sure if 1:2 FP64 and wider memory bus count as innovation here.

All that is architecturally speaking. 7nm is a quality on its own


----------



## Manoa (Jan 14, 2019)

so nvidea did improve the rasterization cores after all, I guess it becouse they have so many money so AMD can't afford it then that the problem
lel man volta 3000$ xD


> Not sure if 1:2 FP64 and wider memory bus count as innovation here


 it not :x but I wish it was: if games used high precision computations it could give better graphics but I think the cards would burn becouse high power will used and temperature...


----------



## londiste (Jan 14, 2019)

Manoa said:


> if games used high precision computations it could give better graphics but I think the cards would burn becouse high power will used and temperature...


There does not seem to be any need for higher precision. Even if there was, 1:2 FP64 is the best case scenario for performance. Current trend is exactly the opposite - using FP16 for some calculations that do not need the precision. Use of that is very situational and does not bring great boost but these days every little bit helps. This is where Rapid Packed Math (RPM) comes in, this runs 2:1 FP16


----------



## Manoa (Jan 14, 2019)

yhe it the true, 780 Ti 192 FP32, maxwell 96, pascal 64 ?
you don't think graphics is lower when used 16 bit float insted of 32 bit ?
I meen in sound this the true, more bit/sample give more quality.
there is also floated textures, I don't realy understand whay it don't need, it a trade ? more speed over more quality ? or more accuracy don't give more quality at all ?
I know GIMP have float point mode operation, and on the right monitor it realy look good...


----------



## rvalencia (Jan 14, 2019)

Manoa said:


> I suspected for some time that augmenting polaris was better than using vega for games, look verry "simple" for AMD: just double everything on polaris and you got a steallar gaming card


RX-580 2X wide would have
4MB L2 cache
12 TFLOPS at 1340  Mhz,
64 ROPS at 1340 Mhz, bottlenecked problem. Polaris ROPS are not connected to L2 cache, hence highly dependant on external memory performance when compared to Vega 64  ROPS.
8 raster engines  at 1340  Mhz, equivalent to six raster engines at 1800Mhz
512 GB/s memory bandwidth.
Still has problems with 64 ROPS at low clock speed.

Ideally, Vega M GH  2X wide would have
48 CU at 1536 Mhz would yield 9.4 TFLOPS
8 raster engines  with 8 Shader Engines at 1536 Mhz
128 ROPS at 1536 Mhz



londiste said:


> Futureproof is a large argument for Radeon 7, mostly with the 16GB VRAM. Similar argument can be made for RTX and DLSS.


DLSS is just pixel reconstruction with multiple samples from previous frames which sounds like PS4 Pro's pixel reconstruction process.

https://www.pcgamer.com/nvidia-turing-architecture-deep-dive/

On previous architectures, the* FP cores would have to stop their work while the GPU handled INT instructions*, but now the scheduler can dispatch both to independent paths. This provides a theoretical immediate performance improvement of 35 percent per core.​
https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/

Turing introduces a new processor architecture, the Turing SM, that delivers a dramatic boost in shading efficiency, achieving 50% improvement in delivered performance per CUDA Core compared to the Pascal generation. These improvements are enabled by two key architectural changes. First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. *In previous generations, executing these instructions would have blocked floating-point instructions from issuing.*​​​https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/

Turing Tensor Cores add new INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization and don’t require FP16 precision. Turing Tensor Cores bring new deep learning- based AI capabilities to GeForce gaming PCs and Quadro-based workstations for the first time. A new technique called Deep Learning Super Sampling (DLSS) is powered by Tensor Cores. DLSS leverages a deep neural network to extract multidimensional features of the rendered scene and intelligently combine details from multiple frames to construct a high-quality final image​
VII supports INT8 and INT4 for  deep learning- based AI capabilities.

Refer to Microsoft's DirectML. Read https://www.highperformancegraphics.org/wp-content/uploads/2018/Hot3D/HPG2018_DirectML.pdf


----------



## FordGT90Concept (Jan 14, 2019)

rvalencia said:


> Async *compute *and Sync *compute shaders *has TMU read-write path software optimizations.
> Again, read Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.
> 
> View attachment 114519


And yet, the numbers say it doesn't really matter:
https://www.techspot.com/review/1762-just-cause-4-benchmarks/








rvalencia said:


> AMD knows ROPS bound problem hence compute shader's read-write path workaround marketing push and VII's 1800Mhz clock speed with lower CU count.


Neither of those things have anything to do with ROPs and everything to do with TSMC 7nm.



rvalencia said:


> NVIDIA's GPU designs has higher clock speeds to speed up classic GPU hardware.


Maxwell and newer utilize long render pipelines which translates to higher clockspeeds.  AMD did similar with the NCU in Vega.



rvalencia said:


> RX-580... AMD hasn't mastered 64 ROPS over 256 bit bus design, hence RX-580 is stuck at 32 ROPS with 256 bit bus. R9-290X has 64 ROPS with 512 bit bus which is 2X scale over R9-380X/RX-580's design.


And yet RX 580 is faster than R9 290X by a great deal.


----------



## rvalencia (Jan 14, 2019)

medi01 said:


> Because "lack of tensor cores" (who the hell needs them in gaming) and elusive "RT stuff" (how many games support it, one?) is so important to highlight when talking about AMD product.


For tensor issue, AMD plans to support DirectML


FordGT90Concept said:


> And yet, the numbers say it doesn't really matter:
> https://www.techspot.com/review/1762-just-cause-4-benchmarks/
> 
> 
> ...


1. Too bad for you,
RTX 2080 has 4 MB L2 cache while GTX 1080 Ti has ~3 MB L2 cache. This is important for tile cache rendering.




2. Wrong, Without memory bandwidth increase, Vega 56 at 1710 Mhz with faster ROPS and raster engines beating Strix Vega 64 at 1590  Mhz shows VII's direction with 1800 Mhz with memory bandwidth increase.

3. NCU itself doesn't compete the graphics pipeline.

4. RX-580 has delta color  compression, 2 MB L2 cache for TMU and geometry (not connected to ROPS), higher clock speed for geometry/quad rastizer units and 8 GB VRAM selection.

R9-290X/R9-390X wasn't updated with Polaris IP upgrades. R9-390X has 8 GB VRAM.


Only Xbox One X's 44 CU GPU has the updates. NAVI 12 has 40 CU with unknown ROPS count, 256 bit GDDR6 and clock speed comparable to VII

You selected NVIDIA gameworks tile with geometry bias,  NV GPUs has higher clock speed for geometry and raster engines.

RTX 2080 has six GPC with six raster engines, *4MB L2 cache* and 64 ROPS with up to 1900 Mhz stealth overclock. Higher L2 cache storage = lower latency, less external memory hit rates. 
GTX 1080 Ti has six GPC with six raster engines, *3MB L2 cache* and 88 ROPS  with up to 1800 Mhz stealth overclock.




R9-390X's 5.9 TFLOPS beats RX-480's 5.83 TFLOPS

R9-390 Pro's 5.1 TFLOPS beats RX-480's 5.83 TFLOPS


----------



## FordGT90Concept (Jan 15, 2019)

rvalencia said:


> Only Xbox One X's 44 CU GPU has the updates. NAVI 12 has 40 CU with unknown ROPS count, 256 bit GDDR6 and clock speed comparable to VII


Xbox One X uses a Polaris design (40 CUs, 2560 shaders, 32 ROPs, 160 TMUs).  Virtually nothing is known about Navi at this point other than it is coming.



rvalencia said:


> You selected NVIDIA gameworks tile with geometry bias,  NV GPUs has higher clock speed for geometry and raster engines.


Don't know what games Avalanche Studios makes, do you? Hint: I referenced Just Cause 4 for a reason.


----------



## rvalencia (Jan 15, 2019)

FordGT90Concept said:


> Xbox One X uses a Polaris design (40 CUs, 2560 shaders, 32 ROPs, 160 TMUs).  Virtually nothing is known about Navi at this point other than it is coming.
> 
> Don't know what games Avalanche Studios makes, do you? Hint: I referenced Just Cause 4 for a reason.


1. Not 100 percent correct. X1X GPU's ROPS has 2MB render cache which doesn't exist for Polaris like RX-580. https://gpucuriosity.wordpress.com/...der-cache-size-advantage-over-the-older-gcns/


2. So what? Just Cause 4 is a Gameworks title. Avalanche Studios knows ROPS bound workaround with TMUs

Without Gameworks,


----------



## londiste (Jan 15, 2019)

Forza Horizon 4 clearly has some type of problem for Nvidia cards at low resolutions. FH3 initially had some CPU usage issue for Nvidia cards and as far as I can see, so does FH4.
By 2160p AMD cards will drop off far faster than Nvidia counterparts.


----------



## rvalencia (Jan 15, 2019)

londiste said:


> Forza Horizon 4 clearly has some type of problem for Nvidia cards at low resolutions. FH3 initially had some CPU usage issue for Nvidia cards and as far as I can see, so does FH4.
> By 2160p AMD cards will drop off far faster than Nvidia counterparts.


If there's CPU bound issue, several GPU's frame rate results would flat line into common frame rate number.

Vega 64 has inferior delta color compression when compared NVIDIA's version which is remedied by VII's higher 1TB/s memory bandwidth.


----------



## FordGT90Concept (Jan 15, 2019)

rvalencia said:


> 1. Not 100 percent correct. X1X GPU's ROPS has 2MB render cache which doesn't exist for Polaris like RX-580. https://gpucuriosity.wordpress.com/...der-cache-size-advantage-over-the-older-gcns/


Vega does have L2 cache for ROPs but it's not clear how much.  Cache is expensive.



rvalencia said:


> 2. So what? Just Cause 4 is a Gameworks title. Avalanche Studios knows ROPS bound workaround with TMUs


Avalanche Studios clearly went out of their way to do optimization work for Vega yet, the gains are small.  Most likely the game was engineered to run on Pascal and it hammered the ROPs.  Transitioning some of the workload off of the ROPs solved their problem.  That's not necessarily because of a design flaw in Vega, more that they were porting their rendering code from Pascal to Vega and had to create a work around where architecturally they are different.  That's what optimization is about in general.



rvalencia said:


> Without Gameworks,


Now you're spamming random benchmarks that in no way prove your point.



Edit: Apparently Radeon VII can use DirectML which in practice can replace DLSS:
https://www.overclock3d.net/news/gp..._supports_directml_-_an_alternative_to_dlss/1

That may imply that Vega 20 has tensor cores.


----------



## mtcn77 (Jan 15, 2019)

Nvidia has DCC, TBR, L2 client ROPs and double ROPs per AMD counterpart.
You cannot compare L2 client ROPs memory controller's overclock range with discrete ROPs memory controller. The memory controller of L2-ROPs only dispatches batched transactions, it has much less traffic since you need to hit maximum memory transfer rate to hit its 'actual' clock. ROPs are at a different frequency than the memory controller.


----------



## FordGT90Concept (Jan 15, 2019)

FordGT90Concept said:


> Edit: Apparently Radeon VII can use DirectML which in practice can replace DLSS:
> https://www.overclock3d.net/news/gp..._supports_directml_-_an_alternative_to_dlss/1
> 
> That may imply that Vega 20 has tensor cores.


I was wrong here, Vega 20 presumably runs DirectML on its compute shaders.  DirectML can use tensor cores (if available), compute shaders (if available), or CPU cores.


----------



## mtcn77 (Jan 15, 2019)

Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.


----------



## rvalencia (Jan 16, 2019)

mtcn77 said:


> Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.


R9-290X at 1Ghz already has 1TB/s L2 cache bandwidth which makes GCNs cryptocurrency friendly cards

VII's 1.8Ghz scaling would be about 1.8 TB/s  L2 cache bandwidth. Vega has compressed texture I/O from external memory.

Don't make me run CUDA app L2 cache benchmark on my GTX 1080 Ti and GTX 980 Ti.



FordGT90Concept said:


> I was wrong here, Vega 20 presumably runs DirectML on its compute shaders.  DirectML can use tensor cores (if available), compute shaders (if available), or CPU cores.


On older DX12 hardware, all lesser datatypes are either emulated into 32bit datatype or run at the same rate as 32bit datatype while newer hardware has higher performance benefits.

DirectML is important for uniformed API access to rapid pack math features in newer hardware while offer software compatibility with older hardware. DirectML benefits the next Xbox One hardware release.

Polaris GPU already pack math feature but it's usage doesn't increase TFLOPS rate and reduces the available stream processors for 32bit datatypes while  Vega fixes this Polaris issue.




mtcn77 said:


> Besides, Nvidia has texture L1 client to L2(TBR) and uses DCC to compress any L1 traffic in L2. Normally their texture bandwidth without L1 caching is the memory interface bandwidth. With texture bandwidth amplification, the card does not write more but can read more textures than AMD's 1TB/s card.


R9-290X at 1Ghz has 1TB/s L2 bandwidth while GTX 980 Ti has about 600 GB/s L2 bandwidth (via CUDA app, disables DCC). R9-290X's ROPS are not connected to L2 cache while GTX 980 Ti's ROPS are connected to L2 cache (for tile cache render loop).

VII's 1800Mhz scale from R9-290X's 1Ghz design reaches to 1.8 TB/s L2 cache bandwidth.

Vega 56/64 has 4MB L2 cache for TMU and ROPS.



FordGT90Concept said:


> 1. Vega does have L2 cache for ROPs but it's not clear how much.  Cache is expensive.
> 
> 
> 2. Avalanche Studios clearly went out of their way to do optimization work for Vega yet, the gains are small.  Most likely the game was engineered to run on Pascal and it hammered the ROPs.  Transitioning some of the workload off of the ROPs solved their problem.  That's not necessarily because of a design flaw in Vega, more that they were porting their rendering code from Pascal to Vega and had to create a work around where architecturally they are different.  That's what optimization is about in general.
> ...


1. Vega 56/64 has 4MB L2 cache. https://www.tomshardware.com/news/visiontek-radeon-rx-vega-64-graphics-card,35280.html

2. Not complete. Geometry/Raster Engines are another problem for AMD GPUs when NVIDIA GPU counterparts has higher clockspeed. Vega 56 at 1710Mhz with 12 TFLOPS beating Strix Vega 64 at 1590Mhz with 13 TFLOPS shows higher clockspeed improves Geometry/Raster Engines/ROPS/L2 cache despite Vega 56's lower TFLOPS.

3. Don't deny Gameworks issues. Hint: Geometry and related rasterization conversion process, and NVIDIA GPU counterparts has higher clockspeed that benefits classic GPU hardware. I advocate for AMD to reduce CU count (reduce power consumption) and trade for higher clock speed e.g. Vega 48 with 1900Mhz to 2Ghz range.


----------



## mtcn77 (Jan 16, 2019)

rvalencia said:


> R9-290X at 1Ghz already has 1TB/s L2 cache bandwidth which makes GCNs cryptocurrency friendly cards
> 
> VII's 1.8Ghz scaling would be about 1.8 TB/s  L2 cache bandwidth. Vega has compressed texture I/O from external memory.
> 
> ...


I concur, however I was pointing out that the IMC has less consequences in a TBR & L2-ROP design. AMD would certainly be able to clock the gpu higher in case they integrated TBR, but also most of Nvidia's advantage is due to r:w amplification through TBR, not frequency alone. They can only write 616GB/s, yes, but setup occurs in reference of texture reads at 1.5TB/s.


----------



## rvalencia (Jan 17, 2019)

mtcn77 said:


> I concur, however I was pointing out that the IMC has less consequences in a TBR & L2-ROP design. AMD would certainly be able to clock the gpu higher in case they integrated TBR, but also most of Nvidia's advantage is due to r:w amplification through TBR, not frequency alone. They can only write 616GB/s, yes, but setup occurs in reference of texture reads at 1.5TB/s.


Running CUDA apps disables delta color compression.
NVIDIA Maxwell/Pascal/Turing GPUs doesn't have PowerVR's "deferred tile render" but it has immediate mode tile cache render.






For my GTX 1080 Ti and 980 Ti GPUs, I can increase L2 cache bandwidth with an overclock.

Vega 56 at higher clock speed still has performance increase  without increasing memory bandwidth and Vega ROPS has multi-MB L2 cache connection like Maxwell/Pascal's ROPS designs.
VII rivalling the fastest Turing GPU with 64 ROPS would be RTX 2080.

Battlefield series games are well known for software tiled compute render techniques which maximises older AMD GCNs with L2 cache connections with TMUs.


For Vega architecture from https://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf
From AMD's white paper

_Vega uses a relatively small number of tiles, and it operates on primitive batches of limited size compared with those used in previous tile-based rendering architectures. This setup keeps the costs associated with clipping and sorting manageable for complex scenes while delivering most of the performance and efficiency benefits. _


AMD Vega Whitepaper:

The Draw-Stream Binning Rasterizer (DSBR) is an important innovation to highlight. It has been designed to reduce unnecessary processing and data transfer on the GPU, which helps both to boost performance and to reduce power consumption. The idea was to combine the benefits of a technique already widely used in handheld graphics products (tiled rendering) with the benefits of immediate-mode rendering used high-performance PC graphics.​Pixel shading can also be deferred until an entire batch has been processed, so that only visible foreground pixels need to be shaded. This deferred step can be disabled selectively for batches that contain polygons with transparency. Deferred shading reduces unnecessary work by reducing overdraw (i.e., cases where pixel shaders are executed multiple times when di erent polygons overlap a single screen pixel).​


PowerVR's deferred tile render is patent heavy.


----------



## mtcn77 (Jan 17, 2019)

rvalencia said:


> Running CUDA apps disables delta color compression.
> NVIDIA Maxwell/Pascal/Turing GPUs doesn't have PowerVR's "deferred tile render" but it has immediate mode tile cache render.
> 
> View attachment 114641
> ...


Don't be difficult, all I'm saying is Nvidia wins by texture 'read' bandwidth; Vega VII and 2080 are supposedly equally matched in this regard(if 2080Ti has 1.5TB/s), so I'm saying it is not because of 'write' concurrency - which LDS shared space solves - that causes it. AMD does not recommend shared memory if the indices don't follow through on reading similar registers. I'm not disputing you can keep up to Rx580 by overclocking the 570(since both's TMU's are ROP-bound), but Vega VII has 3x more fillrate at 4x bandwidth - there is much more to spare until we start L2 bashing.


----------



## rvalencia (Jan 18, 2019)

mtcn77 said:


> Don't be difficult, all I'm saying is Nvidia wins by texture 'read' bandwidth; Vega VII and 2080 are supposedly equally matched in this regard(if 2080Ti has 1.5TB/s), so I'm saying it is not because of 'write' concurrency - which LDS shared space solves - that causes it. AMD does not recommend shared memory if the indices don't follow through on reading similar registers. I'm not disputing you can keep up to Rx580 by overclocking the 570(since both's TMU's are ROP-bound), but Vega VII has 3x more fillrate at 4x bandwidth - there is much more to spare until we start L2 bashing.


What you talking about "since both's TMU's are ROP-bound" when TMUs are the workaround path for ROPS bound?

In terms of raw ROPS read/write hardware capabilities,

VII's ROPS (64 ROPS at 1800Mhz connected to multi-MB L2 cache)  is *2.7X* of RX-580's ROPS (32 ROPS at 1340 Mhz connected to memory controllers)
VII's 1TB/s raw memory bandwidth is *4X *of RX-580's 256GB/s raw memory bandwidth. Vega's DCC is slightly better than Polaris DCC.


----------



## mtcn77 (Jan 23, 2019)

mtcn77 said:


> I concur, however I was pointing out that the IMC has less consequences in a TBR & L2-ROP design. AMD would certainly be able to clock the gpu higher in case they integrated TBR, but also most of Nvidia's advantage is due to r:w amplification through TBR, not frequency alone. They can only write 616GB/s, yes, but setup occurs in reference of texture reads at 1.5TB/s.


Something has been bothering me ever since... R:W is not the same across the bandwidth spectrum. You need to port accesses for reads in order to gain full throughput - not equal, "read-biased bandwidth".


> In fact, it's horribly slow, because a GCN CU can process 64 float multiply-add instructions per cycle, which is 64×3×4 bytes of input data, and 64×4 bytes of out. Across a large chip like a Vega10, that's 48 KiB worth of data read in a single cycle -- at 1.5 GHz, that's 67 TiB of data you'd have to read in.


----------



## rvalencia (Feb 15, 2019)

mtcn77 said:


> Something has been bothering me ever since... R:W is not the same across the bandwidth spectrum. You need to port accesses for reads in order to gain full throughput - not equal, "read-biased bandwidth".


Real world shader program is more than a single line FMA operation and each CU has local data storage and L1 cache. Smaller shader loop should be able to fit within CU's local storage.

At 1 Ghz, R9-290X's 1MB L2 cache has 1TB/s bandwidth, but this bandwidth is not connected to ROPS until Vega era IP. RX-480 and Fury X has 2 MB L2 cache and it's not connected to ROPS.

Xbox One X GPU has 2 MB L2 cache for TMU and 2MB render cache for ROPS (feature missing in Polaris IP).

For raster graphics, ROPS is the primary read/write units to expose the GPU TFLOPS into L2 cache and external memory bandwidth. AMD has been pushing for async compute which uses texture unit as read-write.

*Nvidia's memory compression superiority with  Pascal.*







AMD needs to use quad stack HBM v2's 1 TB/s memory bandwidth to rival RTX 2080's effective memory bandwidth.


----------

