I wonder why the Geforce boys did that doesn't make sense to me but I'm no expert.
I don't quite remember which specific differences exist, but the TMUs in Fermi are different. I do remember two important differences, but there were some more.
1- Well, first one is not a difference in TMUs themselves, it's on the much improved caches, so that the utilization (efficiency) is better. In the past textures had to be local (because chaches we local), so if different threads (running in completely different shader cluster) needed the same texture, that texture needed to be loaded twice or as many times as required. In Fermi cahces are global, every shader multiprocessor has access to everything, so they just need to load it once.
2- The second difference of lesser importance now, but still important for shadows and some other effects is the hardware jittering acceleration in Fermi's texture units. It's basically the ability to fetch four texels in one texture operation. Future games will see a greater improvement from this feature than now, but there is a small improvement even today.