AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

ValenOne · Aug 31, 2018

Vya Domus said:
That's not how it works, ALUs are ALUs. Vega 10 is GP100 class.

Wrong. ALU's main read/write operations are from TMUs or ROPS.

Compute Shader Path uses TMUs read/write functions.
Pixel Shader Path uses ROPs read/write functions.

Vega 64 has GP104 class classic GPU hardware. GP102 has more rasterization and ROPS hardware when compared to Vega 64. Vega GPUs gained ROPS being connected to L2 cache design as per Maxwell/Pascal designs.

Vega 64 has GP102 class FP32 compute hardware.

Both Vega 64 and GP104 has quad rasterization /quad GPC units and 64 ROPS.

Vya Domus · Aug 31, 2018

rvalencia said:
ALU's main read/write operations are from TMUs or ROPS.

What are you talking about? Vertex, primitive and fragment processing all happen on the unified shaders aka FP32 ALUs. At the same time, that's where the kernels run if you use CUDA/OpenCL , a shader is the analog of a compute kernel and they use exactly the same resources.

TMUs and ROPs are not, by any stretch of the imagination, the only classifiers as far graphics processing goes. It's the reason they have been decoupled from the graphics pipeline more than a decade ago and now shaders far outnumber TMUs and ROPs. It used to be that for each shader you would have one ROP and one TMU but that proved to be wasteful and it turned out you can extract more performance by adding more shaders since rasterization does not require the same scaling in terms of performance.

The ratio of ROP to shader of the new RTX 2080ti is 1 to 49, that's how important they are to graphics performance.

Seriously, ROPs and TMUs is the only hardware that dictates traditional GPU performance ? I feel like I would hear that 20 years ago.

ValenOne · Sep 1, 2018

Vya Domus said:
What are you talking about? Vertex, primitive and fragment processing all happen on the unified shaders aka FP32 ALUs. At the same time, that's where the kernels run if you use CUDA/OpenCL , a shader is the analog of a compute kernel and they use exactly the same resources.

TMUs and ROPs are not, by any stretch of the imagination, the only classifiers as far graphics processing goes. It's the reason they have been decoupled from the graphics pipeline more than a decade ago and now shaders far outnumber TMUs and ROPs. It used to be that for each shader you would have one ROP and one TMU but that proved to be wasteful and it turned out you can extract more performance by adding more shaders since rasterization does not require the same scaling in terms of performance.

The ratio of ROP to shader of the new RTX 2080ti is 1 to 49, that's how important they are to graphics performance.

Seriously, ROPs and TMUs is the only hardware that dictates traditional GPU performance ? I feel like I would hear that 20 years ago.

That's ignorance argument. OpenCL/Compute Shader doesn't use ROPS as it's primary read/write units. Note why AMD is pushing for Compute Shader based optimizations.

GTX 1080 Ti has more rasterization (6 GPCs) and ROPS (88 ROPS at 1600 Mhz to 1800 Mhz) power when compared to Vega 64's 4 GPC equivalent and 64 ROPS at 1550 Mhz to 1600Mhz.

With AMD GPUs, pixel and computer shader read/write paths are different i.e. common ALUs with different read/write paths. AMD's pixel shader read/write path is bottleneck'ed by 64 ROPS units.
RTG changed thier CU/TMU vs ROPS ratio with recent Vega 24 M GH i.e. 24 CU (96 TMU) vs 64 ROPS (64 pix/clk)

.

NVIDIA Maxwell/Pascal GPUs still has split TMU and ROPS read/write paths with common shader ALUs.

RTX cores in RTX 2070/2080/2080 Ti marks the return non-unified shader/compute cores.

System Name	Eula
Processor	AMD Ryzen 9 7950X
Motherboard	MSI MPG B850 Edge Ti WiFi
Cooling	Corsair H150i Elite LCD XT White
Memory	Trident Z5 Neo RGB DDR5-6000 CL32-38-38-96 1.40V 64GB (2x32GB) AMD EXPO F5-6000J3238G32GX2-TZ5NR
Video Card(s)	Gigabyte GeForce RTX 4080 GAMING OC
Storage	Crucial P3 Plus, 4 TB NVMe, Samsung 980 Pro 2TB NVMe, Toshiba N300 10TB HDD, WDC Red Pro NAS HDD
Display(s)	Acer Predator X32FP 32in 160Hz 4K, Corsair Xeneon 32UHD144 32in 144 hz 4K
Case	Antec Constellation C8 RGB White
Audio Device(s)	Creative Sound Blaster Z
Power Supply	Corsair HX1000 Platinum 1000W
Mouse	SteelSeries Prime Pro Gaming Mouse
Keyboard	SteelSeries Apex 5
Software	MS Windows 11 Pro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Eula
Processor	AMD Ryzen 9 7950X
Motherboard	MSI MPG B850 Edge Ti WiFi
Cooling	Corsair H150i Elite LCD XT White
Memory	Trident Z5 Neo RGB DDR5-6000 CL32-38-38-96 1.40V 64GB (2x32GB) AMD EXPO F5-6000J3238G32GX2-TZ5NR
Video Card(s)	Gigabyte GeForce RTX 4080 GAMING OC
Storage	Crucial P3 Plus, 4 TB NVMe, Samsung 980 Pro 2TB NVMe, Toshiba N300 10TB HDD, WDC Red Pro NAS HDD
Display(s)	Acer Predator X32FP 32in 160Hz 4K, Corsair Xeneon 32UHD144 32in 144 hz 4K
Case	Antec Constellation C8 RGB White
Audio Device(s)	Creative Sound Blaster Z
Power Supply	Corsair HX1000 Platinum 1000W
Mouse	SteelSeries Prime Pro Gaming Mouse
Keyboard	SteelSeries Apex 5
Software	MS Windows 11 Pro

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

ValenOne

Vya Domus

ValenOne