Here's the way I'd look at it. The architecture is designed to operate as the full chips, when you start cutting bits off the balance between the processors/schedulers/registers/controllers/etc. changes and as NV says, it does the same with the interconnects. The way the architecture is designed, assigns certain areas into blocks, each with their own components, buses, etc. and cutting one of the blocks away, or part of it, gimps the performance of that block. Now, the crossbars partially solve this, by allowing some intercommunication between the different parts of the chip, however, there is a tradeoff to be made, because more crossbars means more cost, and they are not utilized as much in a full chip configuration. A way to potentially alleviate this issue, is similar to what intel does with their extremely large server chips, using a ring bus configuration, however this has (AFAIK) not been implemented to a chip that has neither the size (talking GM200/GK110/GF110 here) or bandwidth requirements, so could result in being even worse than the current configuration due to extra die area necessitated for such a system. And I don't think the performance benefit will justify a ring bus architecture on the smaller chips.
What it comes down to is optimizations on the architecture level and tradeoffs they will most certainly have taken into account when designing the full and cut chips. They have spent way more R&D time on it than we have, and I'm sure they have a lot more resources to use too, so I don't feel we are in a position to question HOW they lay out their architecture. I am also quite sure that these kinds of issues exist with almost any architecture, especially with cut dies, both CPU and GPU (or any other processor for that matter).
BUT, and here is a big but (and it's underlined too, guess that makes it an important but...) I have to question NVidia's way of marketing this. OK, sure, there are 4GB of accessible to, and the memory bus operates at the stated speed, but I feel there should at least be a side note that not all of the memory is addressed at the stated speed. Then again, this complicates things for the less tech savvy, and results in more confusing numbers.