No, you are mixing things up. AMD vs Nvidia shows that one TFLOP metric used for one architecture cannot 100% model the performance characteristics of another architecture. It does not contradict however my statement that TFLOPS are in general a good tool to approximate or predict performance. Please tell me how many 5 TFLOPS GPUs you find that consistently beat let's say an 8 TFLOP GPU, go ahead, find any pair as such, any generation, any manufacturer. Then tell me how many did you find that don't.
Important ? Yes. A good indicator of performance ? No, it's absolutely worthless.
GP104 : 64 ROPs, 8.8 TFLOPS
TU106 : 64 ROPs, 7.5 TFLOPS
TU104 : 64 ROPs, 10 TFLOPS
If we go by TFLOPS the ranking list should be :
TU104 > GP104
TU104 > TU106
GP104 > TU106
We know in fact it's more like :
TU104 > GP104
TU104 > TU104
GP104 < TU106
2 out 3, pretty damn good estimation considering I know absolutely nothing more than the theoretical FLOPS.
What does the ROP count tell us here ? Absolutely nothing, nada, zero. The unique Vega example which does seem to be limited by it's ROPs and that's an assumption by the way, because neither me or you knows that for sure it's known that GCN has other limitations and peculiarities. But there must a point where you need to realize TFLOPS are the dominant factor in all this and not the ROPs. It's a bizarre argument you got here, at the end of the day you can be limited by anything. You can extend this notion and claim that memory bandwidth is the most important thing because you can have as many execution ports and ROPs as you want if you don't have the memory bandwidth it's for nothing.
Here is a much more sensible explanation on why Vega performs better at higher clocks : unlike other GPU architectures GCN has scalar ports used for instructions that don't need multiple SIMD lanes, and we know GCN can very ineficient with it's 64 wide wavefront. It would make sense that for certain shaders that make use of a lot of instructions that cannot be efficiently scheduled in a wavefront would run a lot quicker if the clocks are higher.
Fact: ROP counts don't change that much generation to generation meanwhile shaders do, a lot.
I've read dozen of papers that had to do with graphics and compute and not once were ROP counts quoted as being indicators of performance, people always use GFLOPS theoretical or measured. It really boggles my mind why you all insist on this, it's simply not the case.
That's a funny statement because the only bits left in a GPU that are DSP like are in fact things like the ROPs or TMUs.