the great thing about the 8 gpu setup means that a healthy shader overclock ober 8 cores means decent gains in overall computational power.
stock shader clock 1500mhz X 128 SP X (MADD (2 FLOPs) + MUL (1 FLOP) X 2 (because theres 2 Gpu's) = 768
GeForce 9800 GX2---------------768---G/Flops---[1152 (peak theoretical)]
GeForce 9800 GX2 SLI (x4)------1536--G/Flops---[2304 (peak theoretical)]
GeForce 9800 GX2 SLI (x8)------3072--G/Flops---[4608 (peak theoretical)]
An over clock of the shader units to somewhere in the order of ~1750mhz, which i would argue is modest on a G92, would bring you these figures.
GeForce 9800 GX2------------896---G/Flops---[1344 (peak theoretical)]
GeForce 9800 GX2 SLI (x4)----1792--G/Flops---[2688 (peak theoretical)]
GeForce 9800 GX2 SLI (x8)----3584--G/Flops---[5376 (peak theoretical)]
so we can see that a modest overclock here brings you a half a teraflop jsut like that, not bad. and a shader overclock of 2000mhz would give you this nice figure on this sytem.
GeForce 9800 GX2 SLI (x8)----4096--G/Flops---[6144 (peak theoretical)
also interestingly ive made some figures based on the current GTX280 estimates.
1296mhz x 240 SP X (MADD (2 FLOPs) + MUL (1 FLOP) X 4 (because we would have four in this system) = 2488 G/flops [3732 (peak theoretical)]
now overclocking....
~1500 mhz x 240 SP X (MADD (2 FLOPs) + MUL (1 FLOP) X 4 (because we would have four in this system) = 2488 G/flops [4320 (peak theoretical)]
~1750 mhz x 240 SP X (MADD (2 FLOPs) + MUL (1 FLOP) X 4 (because we would have four in this system) = 3360 G/flops [5040 (peak theoretical)]
~2000 mhz x 240 SP X (MADD (2 FLOPs) + MUL (1 FLOP) X 4 (because we would have four in this system) = 3840 G/flops [5760 (peak theoretical)]
just imagine four GX2-280's... a man can dream......
The Tech Report
One thing I should note: I've changed the FLOPS numbers for the GeForce cards compared to what I used in past reviews. I decided to use a more conservative method of counting FLOPS per clock, and doing so reduces theoretical GeForce FLOPS numbers by a third. I think that's a more accurate way of counting for the typical case.
Wikipedia
For example the GeForce 8800 GTX has 518.43 GigaFLOPs theoretical performance given the fact that there are 128 stream processors at 1.35 GHz with each SP being able to run 1 Multiply-Add and 1 Multiply instruction per clock [(MADD (2 FLOPs) + MUL (1 FLOP))×1350MHz×128 SPs = 518.4 GigaFLOPs][4]. This figure may not be correct because the Multiply operation is not always available[5] giving a possibly more accurate performance figure of (2×1350×128) = 345.6 GigaFLOPs.