• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

That's not how it works, ALUs are ALUs. Vega 10 is GP100 class.
Wrong. ALU's main read/write operations are from TMUs or ROPS.

Compute Shader Path uses TMUs read/write functions.
Pixel Shader Path uses ROPs read/write functions.

Vega 64 has GP104 class classic GPU hardware. GP102 has more rasterization and ROPS hardware when compared to Vega 64. Vega GPUs gained ROPS being connected to L2 cache design as per Maxwell/Pascal designs.

Vega 64 has GP102 class FP32 compute hardware.

Both Vega 64 and GP104 has quad rasterization /quad GPC units and 64 ROPS.
 
ALU's main read/write operations are from TMUs or ROPS.

What are you talking about? Vertex, primitive and fragment processing all happen on the unified shaders aka FP32 ALUs. At the same time, that's where the kernels run if you use CUDA/OpenCL , a shader is the analog of a compute kernel and they use exactly the same resources.

TMUs and ROPs are not, by any stretch of the imagination, the only classifiers as far graphics processing goes. It's the reason they have been decoupled from the graphics pipeline more than a decade ago and now shaders far outnumber TMUs and ROPs. It used to be that for each shader you would have one ROP and one TMU but that proved to be wasteful and it turned out you can extract more performance by adding more shaders since rasterization does not require the same scaling in terms of performance.

The ratio of ROP to shader of the new RTX 2080ti is 1 to 49, that's how important they are to graphics performance.

Seriously, ROPs and TMUs is the only hardware that dictates traditional GPU performance ? I feel like I would hear that 20 years ago.
 
Last edited:
What are you talking about? Vertex, primitive and fragment processing all happen on the unified shaders aka FP32 ALUs. At the same time, that's where the kernels run if you use CUDA/OpenCL , a shader is the analog of a compute kernel and they use exactly the same resources.

TMUs and ROPs are not, by any stretch of the imagination, the only classifiers as far graphics processing goes. It's the reason they have been decoupled from the graphics pipeline more than a decade ago and now shaders far outnumber TMUs and ROPs. It used to be that for each shader you would have one ROP and one TMU but that proved to be wasteful and it turned out you can extract more performance by adding more shaders since rasterization does not require the same scaling in terms of performance.

The ratio of ROP to shader of the new RTX 2080ti is 1 to 49, that's how important they are to graphics performance.

Seriously, ROPs and TMUs is the only hardware that dictates traditional GPU performance ? I feel like I would hear that 20 years ago.
That's ignorance argument. OpenCL/Compute Shader doesn't use ROPS as it's primary read/write units. Note why AMD is pushing for Compute Shader based optimizations.

GTX 1080 Ti has more rasterization (6 GPCs) and ROPS (88 ROPS at 1600 Mhz to 1800 Mhz) power when compared to Vega 64's 4 GPC equivalent and 64 ROPS at 1550 Mhz to 1600Mhz.

With AMD GPUs, pixel and computer shader read/write paths are different i.e. common ALUs with different read/write paths. AMD's pixel shader read/write path is bottleneck'ed by 64 ROPS units.
RTG changed thier CU/TMU vs ROPS ratio with recent Vega 24 M GH i.e. 24 CU (96 TMU) vs 64 ROPS (64 pix/clk)


Vega Final Presentation-33.pngVega RenderL2.jpg.

NVIDIA Maxwell/Pascal GPUs still has split TMU and ROPS read/write paths with common shader ALUs.

RTX cores in RTX 2070/2080/2080 Ti marks the return non-unified shader/compute cores.
 
Last edited:
Back
Top