Ok... but why is a power of 2 more efficient? My apologies here for being dense...
Again, it shouldnt have that 970 issue. The back end ROPs (read: the math) seems to all jive to me?
Basically, it's all to do with addressing and building the infrastructure for it inside the chip. I'm going to assume that you're familiar with base 2 (binary) and number bases in general here.
To make for a really simple example, imagine that you have a memory chip with just 4 locations. These will take 2 bits to address, ie a 2-bit address bus. The value of the bottom (first) address will be zero (00 binary) and the last (top) address 3 (11 binary).
Now imagine a lopsided memory chip with just 3 locations. You will still need to build the infrastructure for 4 addresses into the chip, since the top bit is still being set, ie value 2 (10 binary) with the top address of 3 (11 binary) pointing nowhere and likely having to be masked off to avoid a crash. Hence the chip will still take the same number of transistors as if it had 4 locations, but not actually
have that extra location in it and therefore the chip will not be an optimal design. Of course, what you get back is that the extra circuitry for the 4th location is missing, saving space, hence making for a compromise.
You have a similar situation regardless of what you're addressing, whether it's CUDA units and the number of bits they each handle in a GPU, or the number of CUDA units in the GPU, or whatever aspect of a digital circuit.
The problem in the real world of course, is that building a perfect power of 2 chip causes the number of transistors and physical size of that chip to double each time it's expanded, ie to grow exponentially which is unsustainable.
When you get to the large sizes of modern GPUs with their billions of transistors, it would tend to quickly outgrow the manufacturing capabilities of current technology. Or if not for a particular design, it would just be excessively large, such as being, for example, 40 millimeters on a side which is impractical for a commercial product that's supposed to make a profit.
No doubt it would also use a tremendous amount of power and emit a correspondingly tremendous amount of heat, making things difficult. Therefore, we see the lopsided GPUs of today to avoid this fate, or at least reduce its impact. Think of the GTX 480 and the tremendous amount of power and heat it used, despite being such a lopsided design. It's a shame and I really don't like this lopsidedness, but there's no choice for a real world GPU.
If you're curious, check out the designs of older entry level GPUs, where you'll see that quite often everything is a perfect power of 2, eg data bus, CUDA cores etc, since it's practical to do so at the smaller sizes.
The 970 memory issue came about, because NVIDIA nibbled a bit off the GPU, giving rise to a compartmentalized memory addressing design, where they chose to use slow RAM for that last 500MB, but didn't declare it, leading to this scandal.
When I saw that the 1080 Ti with its weird 11GB RAM and crippled GPU, it brought back to me that NVIDIA could potentially have the same design issue. However, it all really depends on the details of the design whether this happens or not and we'll soon know once the official reviews are out. I doubt they'd repeat the same mistake, especially on their flagship product.
@efikkan back there thinks I'm "completely wrong" about a power of 2 chip being optimal, but I'm not, as I've explained above. He just didn't quite understand what I was saying.
Oh and you asked for it - check my sig!