Hydra receives orders from the CPU, splits them into workloads suitable for different GPU's, then send them to GPU's. When the GPU's finished their jobs they sent the results back to Hydra. Hydra combines them then sent it to the primary GPU for display. Why wouldn't this add latency?
Also, NF200 does not do the same thing.
Current xfire/SLI:
1. Get draw commands
2. Split commands to both cards (if not doing AFR then dissect the image for the gpus to render their part).
3. Send data to both cards
4. Both cards render their part
5. Card 2 sends its data to card one for combining before sending to VDU.
Hydra:
1. Get draw commands
2. Detect different load capabilities on cards
3. Split commands to both cards (this will use a similar part to AFR i believe, so each card does a whole frame instead of tiled/split frames like super AA can do).
4. Send data to both cards
5. Both cards render their part
6. Card 2 sends its data to hydra which rediects it to card 1 (simple connection, not any latency) and the frame is interjected between frames generated by card 1 to the VDU (again very simple, no real latency).
With both of these setups the overall latency is going to be so close i'd say it will be indistinguishable. The hydra chip is meant to split the directX commands between the cards which are allowed to render the image using their own methods (if different). The image is then simply sent to be interjected between frames by card 1 - due to the hydra splitting the workload properly then you don't have to worry about rejoining parts of the same frame from different cards, or having to assume the cards run at a similar/same speed and having to sync the frames between the two properly (which is the cause of a lot of the overhead in current setups).
To split the data the hydra chip doesn't have to do too much work - once it knows the relative capabilities of each card it can simply direct the directx commands between them with no extra work needed - i.e. if card 1 is twice as fast as card 2 you just do:
Draw 1 -> card 1
Draw 2 -> card 1
Draw 3 -> card 2
Draw 4 -> card 1
Draw 5 -> card 1
Draw 6 -> card 2
Which isn't too expensive as you just direct the command down the apropriate pci-e connection.