That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say
AMD couldn't [increase the number of CUs] and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost.
This is
directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory
overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance
far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.
Now this is of course not to say that Vega doesn't
also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.
And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs
terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.
If AMD could have moved past 64 CUs with Vega, that lineup would have been
much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.
So, tl;dr: the 64 CU architectural limit of GCN has been a
major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.