If only these things worked so linearly...
Ok let's think logically: all these new shaders have to be connected to the L2 cache, those connections take some space. Then you'll probably need a larger(and/or faster) L2, unless you want to leave all those new shiny cores starved for information. Then the back-end with the 50% more TMUs will need to be rewired and all of these new changes will need to be tested over and over and over...I dunno man you make it sounds a lot easier than it actually is.
Nothing is wired as you say. Fermi is 100% modular and every step was designed so that you can add anything in a LEGO fashion, from SIMDs, to SMs, to complete GPCs. Buffers and pooled buses are placed between every step for that purpose and the performance penalty that Fermi suffers in terms of SP/performance in comparison with G80/G92/GT200 supposedly comes from this re-alignment. The trade off was made (just like when Ati created R600), now it's time to add the components that actually do the work.
Yeah, maybe it's not as easy as I made it out to be, but it certainly isn't as difficult as a competely new chip. It's been 6+ months since GF104 was finished (not released). 6 months is more than enough to make that thing and then some. Besides forget about release times, it's internal times which we have to look at, and thse are unknown. Release dates for GF104, 106 and 108 were not based on when the design was finished, but on when can I make enough of them for a proper release,
without eating up on production of the chips that make me most money, that is higher end ones. Bottom line Fermi derivatives were probably almost finished probably even before GF100 cards were released. Enough time for anything.
EDIT: And no, you don't need more L2. Fermi had much more L2 than any GPU will ever need. Reason GPGPU (GF100 is and will always be the GPGPU chip, just like G80 always was the GPGPU part, G92 existed oly like a gaming chip tho). GF104 is showing any decrease in performance due to less L2 per SP? No, not a single 1%. And 50% more SPs per SM were added. Adding another 16 SP, equalling a 33% increase is not going to change that either.
Also:
If only these things worked so linearly...
They don't indeed, but it's actually the other way around as you are suggesting and in absolute favor for the "3/2 GF104-GF110":
- Doubling execution units usually never doubles transistor count or die area, especially die area. And you waste less area in "margins" (I know there's a term for that). i.e:
Ati
Redwood = 627 million
Juniper = 1040 million
Cypress = 2150 million, more than twice yes, but it does not count because it has at least a massive difference in that it supports 64 bit, while Juniper and below don't.
RV730 = 514 million (remember 320 SP)
RV740 = 826 million (640 SP)
RV770 = 956 million (800 SP)
Nvidia
GF108 = 585 million
GF106 = 1170 million
GF104 = 1950 million
- Power requirement increases are almost always lower than the actual active transistor increase.