What you (and Dalelaroy)are saying makes sense and I am not arguing against the logic, however "they" don't keep the same number of ROP's from the mid range to the high end, thats just the point, HD 5850 does not have the same amount of ROP's as the HD 5870 does even though they both have the same memory bus, why? because as i said, each cards has its market segement as well as they both have different SP's, because there are links with TMU's, SP's and Rop's, the 5850 with it's lesser SP count is given
First of all the HD5850 and HD5870 do have the same ammount of ROPs. Second you are right regarding the links, there is probably a close limit in the relation between ROPs and memory bandwidth, personally I think this limit is mostly on z/stencil. The two main purposes of ROPs are to calculate z/stencil and blend final pixels, either one requires writing to memory, so there is a strong relation between both.
I stand to be corrected in what follows, as I'm in no way an expert, but it's what I understand from the things I do know or have heard about. Let's explain it with an example, and let's take the HD5870 numbers from the chart in the OP.
Memory bandwidth: 153.6 G
B/s == 1228.8 G
b/s
Pixel fillrate: 27.2 GPixel/s
Z/stencil: 108.8 G
Samples/s
Now for stencil the most common used value is one byte per pixel, while Z sample in modern games is either 24 bit or 32 bit, because 16 bit creates artifacts.
Thus the average bit-lenght of samples is going to be between 8 and 24/32, let's settle down to 16 bit samples. Simple math from the specs tells that 108.8 Gsamples x 16 bit samples = 1740,8 Gb/s.
As you can see the required bandwidth to write z/stencil only scenarios already exceeds memory bandwidth limitations and it's worse in the cases when it's doing Z test. Of course the ROPs also have to write down pixels so I understand that is less taxing and makes up for the difference, because typical HDR pixels are 16 bit wide (per channel), so 27.2 GPixel/s x 16 bit* = 435.2 Gb/s and output of current games is 32 bit so 870.4 Gb/s.
* Here I have to admit I don't know if the pixels are blended and written separately by channels or alltogether. In case of the latter, the figure jumps to 1740.8 Gb/s (64 bit x 27.2 GPixel/s) again, and may actually reflect better the relation as the average of both 32 bit and 64 bit outputs is 1305.6 Gb/s, quite similar to the actual memory bandwidth.
As you might have guessed already doubling (even increasing) the ROPs is not going to yield any substantial gains, even with the increased 25% GDDR5 speed of 7 GT/s modules, especially considering that above numbers are only for write operations (and not all of them) and you still have to take into account read operations.
That being said, the above is just talking about theoretical throughouput and the effective balance. On practice, I think that 32 ROPs are more than enough for the kind of performance we can expect from Cayman and putting 64 would be a waste of die area for little or no gain (something I could see Nvidia doing** but not AMD). 48 would be ideal I guess, but I don't think AMD is willing to use odd numbers, or they would have done it in the past, with crippled 256 bit parts instead of making them 128 bit...
** This is another story, but the reason Nvidia "wastes" die area on 384 bit / 48 ROPs is because they are critical in the proffesional Quadro/Tesla cards, not because it poses any dramatical improvement or necessity on the desktop cards.