• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Radeon VII Detailed Some More: Die-size, Secret-sauce, Ray-tracing, and More

Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
I wonder why AMD is stuck with maximum 4096 SP's ?
I mean.... Fury, Vega (1), Vega II ... they are almost identical.

Considering that the new chip is rather small at 331 mm2, what stopped them from making a 450 mm2 chip for example and fitting 72 CU's in it, or 96 !!
It would wipe the floor with 2080 Ti with 6144 SP's (let's say cut a few for being defective, even with 5760 SP's it would still crush it with raw computer power and that massive 1TBps bandwidth, WHILE BEING A SMALLER CHIP due to 7nm)

Instead, they just shrunk Fury, then shrunk it again without adding anything :(
FordGT90Concept said it's yields, and that's part of it, but the biggest reason is probably resource management. If AMD were to make a GPU with 50% more cores, it would need at least 50% scheduling resources. Resource management is already the main reason why GCN is inefficient compared to Nvidia, and the reason why RTX 2060 (1920 cores) manages to match Vega 64 (4096 cores). As we all know, AMD have plenty of theoretical performance that they simply can't utilize properly. Adding 50% more cores would require rebalancing of the entire design, otherwise they would risk getting even lower efficiency. Vega 20 is just a tweaked design with some professional features added.
 
Joined
Oct 14, 2017
Messages
210 (0.08/day)
System Name Lightning
Processor 4790K
Motherboard asrock z87 extreme 3
Cooling hwlabs black ice 20 fpi radiator, cpu mosfet blocks, MCW60 cpu block, full cover on 780Ti's
Memory corsair dominator platinum 2400C10, 32 giga, DDR3
Video Card(s) 2x780Ti
Storage intel S3700 400GB, samsung 850 pro 120 GB, a cheep intel MLC 120GB, an another even cheeper 120GB
Display(s) eizo foris fg2421
Case 700D
Audio Device(s) ESI Juli@
Power Supply seasonic platinum 1000
Mouse mx518
Software Lightning v2.0a
it also whay async help it significantly ?
 
Joined
Mar 10, 2015
Messages
3,984 (1.12/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
whare do you see complaining ? and whare do you see don't exist ?

AMD shills ? I can respect that, you look like a fighter too. "fight for your right for gaming on AMD, kill anyone that looks like against AMD" ?
but you could use a brain: a gaming developer relationship program will benefit AMD more than a few more mhz and a few more GB/s memory banwith
you don't see the advantage of that ? for your own good ? what does that make you ?
how about async compute enabled on all games sounds to you ? should I mention how much faster doom 4 was with async enabled ? and that was just one game where is was used.....WITHOUT AMD's help...and that's just the beginning, are you able to imagine what it could mean if AMD was involved ? in all games ?
oh wait, you are a radeon expert, im sorry you must know more than I do

U mad bro?

Actually, I would prefer no developer relationships with either company because they are only good for one 'color'. Clearly, one of us needs a brain...
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
it also whay async help it significantly ?
Yes, GCN is an async monster because of under utilization of its hardware resources.
 
Joined
Feb 3, 2017
Messages
3,810 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
it also whay async help it significantly ?
Yes, GCN is an async monster because of under utilization of its hardware resources.
Is it true any more? Latest testing on async seems to show both camps benefiting from it, AMD cards more than Nvidia but the difference is a couple percent at most.
 
Joined
Apr 10, 2013
Messages
302 (0.07/day)
Location
Michigan, USA
Processor AMD 1700X
Motherboard Crosshair VI Hero
Memory F4-3200C14D-16GFX
Video Card(s) GTX 1070
Storage 960 Pro
Display(s) PG279Q
Case HAF X
Power Supply Silencer MK III 850
Mouse Logitech G700s
Keyboard Logitech G105
Software Windows 10
I fail to see the missing technology. RTX is usable in one game...and the series is trash. DLSS looks like shit compared to the other available methods. I fail to see what benefits the 2080 has.
Missing tech is missing tech. It isn't in one game, it is in many more and many more coming. Real time ray tracing is in these games:
  • Assetto Corsa Competizione
  • Atomic Heart
  • Battlefield V
  • Control
  • Enlisted
  • Justice
  • JX3
  • MechWarrior 5: Mercenaries
  • Metro Exodus
  • ProjectDH
  • Shadow of the Tomb Raider
As for DLSS, the list is longer:
  • Ark: Survival Evolved
  • Anthem
  • Atomic Heart
  • Battlefield V
  • Dauntless
  • Final Fantasy 15
  • Fractured Lands
  • Hitman 2
  • Islands of Nyne
  • Justice
  • JX3
  • Mechwarrior 5: Mercenaries
  • PlayerUnknown’s Battlegrounds
  • Remnant: From the Ashes
  • Serious Sam 4: Planet Badass
  • Shadow of the Tomb Raider
  • The Forge Arena
  • We Happy Few
  • Darksiders III
  • Deliver Us The Moon: Fortuna
  • Fear the Wolves
  • Hellblade: Senua’s Sacrifice
  • KINETIK
  • Outpost Zero
  • Overkill’s The Walking Dead
  • SCUM
  • Stormdivers
Again, the Radeon VII price is just too high. You'll notice I never criticize the product - it uses great memory and plenty of it. I like that feature. What I don't like is the high price and missing features. At a proper market price of 499-549 it is a winner. To end with a tip, users appear more credible if you post your comments without profanity and with support for facts.

Data sources: https://www.digitaltrends.com/computing/games-support-nvidia-ray-tracing/ , https://www.kitguru.net/components/...rt-nvidias-ray-tracing-and-dlss-rtx-features/
 
Joined
Mar 10, 2015
Messages
3,984 (1.12/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
Phew. Exactly 0 of those games are on my play list so not a huge loss. That said, thanks for throwing the lists up there. Now, what do we do about RTRT making everything look a mirror? or DLSS looking like a jaggy mess? I will say that I did go back and look at the comparisons and DLSS doesn't look as bad as I originally thought. I still wouldn't call either of these pieces of technology that are earth shattering. Down the line? Probably. Right now? Nothing special.
 
Joined
Nov 3, 2011
Messages
695 (0.14/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
Which is why Vega 20 isn't bigger than Vega 10. I think Huang's explosion is because he realizes he made a "big" mistake with Turing. AMD is focusing on where the money is at, not winning performance crowns that mean little in the larger context of things. Turing is substantially larger (and more costly to produce) than even Vega 10 is.


On topic, Vega 20 doesn't really impress but it really wasn't intended to impress either. Vega 7nm w/ Fiji memory bandwidth.
VII is still Vega with the same 64 ROPS bottleneck. AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.

Missing tech is missing tech. It isn't in one game, it is in many more and many more coming. Real time ray tracing is in these games:
  • Assetto Corsa Competizione
  • Atomic Heart
  • Battlefield V
  • Control
  • Enlisted
  • Justice
  • JX3
  • MechWarrior 5: Mercenaries
  • Metro Exodus
  • ProjectDH
  • Shadow of the Tomb Raider
As for DLSS, the list is longer:
  • Ark: Survival Evolved
  • Anthem
  • Atomic Heart
  • Battlefield V
  • Dauntless
  • Final Fantasy 15
  • Fractured Lands
  • Hitman 2
  • Islands of Nyne
  • Justice
  • JX3
  • Mechwarrior 5: Mercenaries
  • PlayerUnknown’s Battlegrounds
  • Remnant: From the Ashes
  • Serious Sam 4: Planet Badass
  • Shadow of the Tomb Raider
  • The Forge Arena
  • We Happy Few
  • Darksiders III
  • Deliver Us The Moon: Fortuna
  • Fear the Wolves
  • Hellblade: Senua’s Sacrifice
  • KINETIK
  • Outpost Zero
  • Overkill’s The Walking Dead
  • SCUM
  • Stormdivers
Again, the Radeon VII price is just too high. You'll notice I never criticize the product - it uses great memory and plenty of it. I like that feature. What I don't like is the high price and missing features. At a proper market price of 499-549 it is a winner. To end with a tip, users appear more credible if you post your comments without profanity and with support for facts.

Data sources: https://www.digitaltrends.com/computing/games-support-nvidia-ray-tracing/ , https://www.kitguru.net/components/...rt-nvidias-ray-tracing-and-dlss-rtx-features/
DLSS..promoting pixel reconstruction technique is like Sony's PS4 Pro's pixel reconstruction marketing. VII has higher memory bandwidth for MSAA.

Yes, GCN is an async monster because of under utilization of its hardware resources.
It's more ROPS read-write bottleneck with workaround fix like async compute/TMU read-write path.
 
Last edited:

M2B

Joined
Jun 2, 2017
Messages
284 (0.10/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
VII is still Vega with the same 64 ROPS bottleneck. AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.


DLSS..promoting pixel reconstruction technique is like Sony's PS4 Pro's pixel reconstruction marketing. VII has higher memory bandwidth for MSAA.


It's more ROPS read-write bottleneck with workaround fix like async compute/TMU read-write path.

You have no idea what you are talking about.
It has nothing to do with ROPs, having 64 ROPs is completely fine at up to 3840*2160p and won't cause any bottleneck.
Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?
 

95Viper

Super Moderator
Staff member
Joined
Oct 12, 2008
Messages
13,039 (2.21/day)
Stop the retaliatory comments, bickering, baiting, and insulting. You know who you are.
Keep the discussions civil and on topic.

Thank You
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
VII is still Vega with the same 64 ROPS bottleneck. AMD should have scaled from Vega M GH with 64 ROPS and 24CU ratio.
AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels processed per second. 4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps. ROPs aren't a problem.


GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do. This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.

I suspect Navi will take lessons learned from Xbox One and PlayStation 4 to produce hardware that needs minimal driver interference.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
It has nothing to do with ROPs, having 64 ROPs is completely fine at up to 3840*2160p and won't cause any bottleneck.
Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?
You're correct. If ROPs were a major bottleneck, AMD would have solved that by now and unleashed a massive performance gain, but they're not.
That doesn't mean you can't find an edge-case where more ROPs don't help, as with anything else, but that's beside the point.

GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do. This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.
Well, it's true that Direct3D 12 and Vulkan can offload a lot of the management done by the drivers, but not the truly low-level allocation and resource management, which happens inside the GPU, and this is where GCN struggles.
 
Last edited:
  • Like
Reactions: M2B

M2B

Joined
Jun 2, 2017
Messages
284 (0.10/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do. This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.

Driver overhead is only a small part of why GCN is behind Maxwell/Pascal/Turing.
Just like intel's advantage over AMD in gaming, my guess is that it has something to do with way lesser cache latencies on Nvidia GPUs.
Of course this isn't the full story.
 
Last edited:
Joined
Jul 24, 2009
Messages
1,002 (0.18/day)
Think it would be maybe viable to have raytracing add-on card. Maybe something in SLi/CrossFire style. One regular, one for raytracing.
 
Joined
Mar 10, 2015
Messages
3,984 (1.12/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
Do you really think AMD is dumb enough to make GPUs that are bottlenecked by such a relatively simple thing?

I mean in everyone else's defense, they aren't doing anything to change it so they were dumb enough to have a bottleneck in a place that isn't easily fixed.
 
Joined
Feb 3, 2017
Messages
3,810 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Now, what do we do about RTRT making everything look a mirror?
Of course the demos and showcases for new features will over-emphasize the feature but once it makes it into actual games, it is generally much more toned down. Have you actually played BF5 with DXR?
Shadow of the Tomb Raider patch (shadows) and Metro Exodus (AO) should be a lot more interesting RTRT use cases once they show up.
 
Joined
Nov 1, 2018
Messages
584 (0.26/day)
AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels processed per second. 4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps. ROPs aren't a problem.
Those need to do both texture element reading and buffer output writing.

It's not just about the final frame, because they aren't drawing the same uniform color 55 thousand times per second.

Every texel that needs to be placed somewhere on a scene also needs to be accessed. The ratio of texels read to pixels written might vary widely, but considering that modern games have millions of triangles on the screen, that's a whole lot of textures that need to be read and their data calculated in order to obtain the final pixel.
Obviously there's CU caching involved as well but the ratio is still huge.

Let's assume some numbers: If Texel to Pixel ratio is 20:1, and due to overdrawing more pixels are written than actually seen the display ratio is "only" 5:1 (might be much higher in complex scenes with a lot of translucency, edge anti-aliasing and more), that is already an 100:1 vs your number.

55 thousand fps is now just 550.

Add multi-sampling (4X) and now back to a more realistic 550 : 4 = 137 fps, which seems to be what these cards can do...
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The myth of the ROP bottleneck for Vega must come to an end.
Just compare GTX 1080, RTX 2060 vs. Vega 64, three GPUs which perform similarly. While GTX 1080 had 102.8 - 110.9 GP/s, RTX 2060 reduced it to 65.52-80.54 GP/s (less than Vega 64's 79.8 - 98.9 GP/s), and still managed to maintain performance, even at 4K.
The issue with GCN is not the raw throughput of the various resources, it's the management of those resources.
 
Joined
Mar 10, 2015
Messages
3,984 (1.12/day)
System Name Wut?
Processor 3900X
Motherboard ASRock Taichi X570
Cooling Water
Memory 32GB GSkill CL16 3600mhz
Video Card(s) Vega 56
Storage 2 x AData XPG 8200 Pro 1TB
Display(s) 3440 x 1440
Case Thermaltake Tower 900
Power Supply Seasonic Prime Ultra Platinum
Of course the demos and showcases for new features will over-emphasize the feature but once it makes it into actual games, it is generally much more toned down. Have you actually played BF5 with DXR?
Shadow of the Tomb Raider patch (shadows) and Metro Exodus (AO) should be a lot more interesting RTRT use cases once they show up.

I'll never play BF5 but Exodus is a maybe. All I have seen is it looks awful in the demos and videos so what is there to get excited about? I have nothing against the 20 series. I just don't understand what all the fuss is about. Are these the future? Probably, but it is not going to be until consoles take them on so there is no point to get all hot and bothered about it now. I hope NV get up to 1.21 jigga rays some day but until then, color me unimpressed. I am more impressed with extra performance they squeezed out with this gen. I didn't realize they could beat a horse that well.
 
Joined
Nov 3, 2011
Messages
695 (0.14/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
AMD ROPs process 4 pixels per clock. In the case of Radeon VII: 64 ROPs * 4 pixels per clock * 1,800,000,000 clocks = 460,800,000,000 pixels processed per second. 4K is 8,294,400 pixels. Radeon VII has enough ROPs and clocks to handle 4K at 55,555 fps. ROPs aren't a problem.


GCNs problem is that the drivers (read: CPU) do more work than Maxwell+ do. This is why Vega and Fiji do not so great at low resolutions but well at high resolutions (CPU matters less). Vulkan and Direct3D 12 perform much better on GCN than Direct3D 11 because it naturally removes a lot of the underlying CPU burden.

I suspect Navi will take lessons learned from Xbox One and PlayStation 4 to produce hardware that needs minimal driver interference.
Wrong.
Vega 56 at 1710 Mhz with 12 TFLOPS beating Strix Vega 64 at 1590mhz with 13 TFLOPS. This shows higher clock speed improves raster hardware which enable TFLOPS to be exposed. TFLOPS is useless without ROPS read-write factors.

For absolute performance, lowering time to completion in all graphics pipeline stages should be a priority. Your argument doesn't lead to lowest latency operation.

Where did you obtain Vega ROPS unit has 4 color pixels per clock? Each RB unit has four color ROPS and 16 z-ROPS. For 64 ROPS, that's 16 RB units x 4 color ROPS = 64 color ROPS, hence it's 64 color pixels per clock.

IF AMD is pushing for compute/256 TMUs read-write path, why not increase ROPS count to match TMU read-write performance?

The myth of the ROP bottleneck for Vega must come to an end.
Just compare GTX 1080, RTX 2060 vs. Vega 64, three GPUs which perform similarly. While GTX 1080 had 102.8 - 110.9 GP/s, RTX 2060 reduced it to 65.52-80.54 GP/s (less than Vega 64's 79.8 - 98.9 GP/s), and still managed to maintain performance, even at 4K.
The issue with GCN is not the raw throughput of the various resources, it's the management of those resources.
RTX 2060 (30 SM) has 48 ROPS at 1900 Mhz stealth overclock. Full TU106 (36 SM) has 4 MB L2 cache. Full GP104 has 2MB L2 cache. This is important for NVIDIA's immediate mode tile cache render loop and improves to lower latency which improves reduces time to completion timings.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Vega 56 at 1710 Mhz with 12 TFLOPS beating Strix Vega 64 at 1590mhz with 13 TFLOPS. This shows higher clock speed improves raster hardware which enable TFLOPS to be exposed. TFLOPS is useless without ROPS read-write factors.
This proves nothing in terms of claiming ROPs are the bottleneck. When using a die with fewer cores and compensating with higher clocks there are a lot of other things than just ROP per GFLOP that changes. Slightly cut down chips may have a different resource balance, both on the scheduling, but also cache and register files. All of these are impacting performance long before ROPs even come into play.

We see the same thing with GTX 970, which has higher performance per clock than it's big brother GTX 980. Why? Because it struck a sweetspot in various resources.
 
Joined
Nov 3, 2011
Messages
695 (0.14/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
You're correct. If ROPs were a major bottleneck, AMD would have solved that by now and unleashed a massive performance gain, but they're not.
That doesn't mean you can't find an edge-case where more ROPs don't help, as with anything else, but that's beside the point.


Well, it's true that Direct3D 12 and Vulkan can offload a lot of the management done by the drivers, but not the truly low-level allocation and resource management, which happens inside the GPU, and this is where GCN struggles.
AMD is pushing compute shader -TMU read-write path.

Refer to Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.

This proves nothing in terms of claiming ROPs are the bottleneck. When using a die with fewer cores and compensating with higher clocks there are a lot of other things than just ROP per GFLOP that changes. Slightly cut down chips may have a different resource balance, both on the scheduling, but also cache and register files. All of these are impacting performance long before ROPs even come into play.

We see the same thing with GTX 970, which has higher performance per clock than it's big brother GTX 980. Why? Because it struck a sweetspot in various resources.
For AMD GCN, register files are with CU level. Each CU has it's own warp scheduling.

GTX 970 has 56 ROPS with less L2 cache.
GTX 980 has 64 ROPS

Vega 56 has full 64 ROPS and full 4MB L2 cache like Vega 64. VII's 60 CU still has full L2 cache and 64 ROPS access like MI60. Faster clock speed improves L2 cache and 64 ROPS i.e. lessen the time to completion.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
https://hothardware.com/news/amd-radeon-rx-vega-56-unlocked-vega-64-bios-flash

Vega 56 running Vega 64 BIOS is 2% slower than Vega 64. That likely has less to do ROPs and more to do with underutilization of shaders as I previous said. Vega, and Fiji before it, were designed for server farms running compute loads. They were never ideal for gaming. Polaris, on the other hand, is biased towards gaming. It has 32 ROPs to 36 CUs (8:9 compared to Vega 64 1:1 or Vega 56 8:7) . Again, if ROPs were really the bottleneck, AMD would have put more ROPs on it but they didn't.

Vega and Fiji do exceptionally well in async games like Ashes of the Singularity because all of those shaders aren't so underutilized.
 
Joined
Nov 3, 2011
Messages
695 (0.14/day)
Location
Australia
System Name Eula
Processor AMD Ryzen 9 7900X PBO
Motherboard ASUS TUF Gaming X670E Plus Wifi
Cooling Corsair H150i Elite LCD XT White
Memory Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s) Gigabyte GeForce RTX 4080 GAMING OC
Storage Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s) Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case Phanteks Eclipse P500A D-RGB White
Audio Device(s) Creative Sound Blaster Z
Power Supply Corsair HX1000 Platinum 1000W
Mouse SteelSeries Prime Pro Gaming Mouse
Keyboard SteelSeries Apex 5
Software MS Windows 11 Pro
https://hothardware.com/news/amd-radeon-rx-vega-56-unlocked-vega-64-bios-flash

Vega 56 running Vega 64 BIOS is 2% slower than Vega 64. That likely has less to do ROPs and more to do with underutilization of shaders as I previous said. Vega, and Fiji before it, were designed for server farms running compute loads. They were never ideal for gaming. Polaris, on the other hand, is biased towards gaming. It has 32 ROPs to 36 CUs (8:9 compared to Vega 64 1:1 or Vega 56 8:7) . Again, if ROPs were really the bottleneck, AMD would have put more ROPs on it but they didn't.

Vega and Fiji do exceptionally well in async games like Ashes of the Singularity because all of those shaders aren't so underutilized.
Async compute and Sync compute shaders has TMU read-write path software optimizations.
Again, read Avalanche Studios lecture on TMU read-write workaround on ROPS bound situations.

compute shader with TMUs.png


Vega RenderL2-640x317.jpg

Btw: Vega M GH has 24 CU with 64 ROPS ratio.

For current Vega 64 type GPU, it's better to trade less CU count which reduce power consumption for higher clock speed which has higher ROPS/L2 cache/rasterization performance. Vega 56 at 1710 Mhz with 12 TFLOPS beating Strix Vega 64 OC at 1590Mhz with 13 TFLOPS.

Rasterization = floating point geometry to pixel integer mass conversion hardware.

Rasterisation hardware.png


Fury X block.jpg

Note the four Rasterzier hardware units, AMD increased this hardware during R9-290X's introduction before "Mr TFLOPS" joined AMD in 2013.

At higher clock speed, classic GPU hardware such as Rasterzier, Render Back End(ROPS) and L2 cache has higher performance i.e. lower time to completion.

At the same clock speed and L2 cache size, 88 ROPS has 88 pixels per clock has lower time completion when compared to 64 ROPS with 64 pixels per clock. AMD knows ROPS bound problem hence compute shader's read-write path workaround marketing push and VII's 1800Mhz clock speed with lower CU count.

NVIDIA's GPU designs has higher clock speeds to speed up classic GPU hardware.


Cryptocurrency uses TMU read-write path instead of ROPS read-write path.

AMD could have configured a GPU with 48 CU, 64 ROPS and 1900 Mhz

RX-580... AMD hasn't mastered 64 ROPS over 256 bit bus design, hence RX-580 is stuck at 32 ROPS with 256 bit bus. R9-290X has 64 ROPS with 512 bit bus which is 2X scale over R9-380X/RX-580's design.
 
Last edited:
Joined
Jul 9, 2015
Messages
3,413 (0.99/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
WCC level of speculations in an article that in essence highlights what nVidia would like to highlight about VegaVII vs 2080.

How do things of that kind work, do authors of these texts simply root for nVidia, do they come from Huang's headquarters, are they somehow censored by NV's PR team?
Just curious.
 
Top