Thursday, April 24th 2008
ATI Radeon HD 4800 Series Video Cards Specs Leaked
Thanks to TG Daily we can now talk about the very soon to be released ATI HD 4800 series of graphics cards with more details. One week ahead of its presumable release date, general specifications of the new cards have been revealed. All Radeon 4800 graphics will use the 55nm TSMC produced RV770 GPU, that include over 800 million transistors, 480 stream processors or shader units (96+384), 32 texture units, 16 ROPs, a 256-bit memory controller (512-bit for the Radeon 4870 X2) and native GDDR3/4/5 support as reported before. At first, AMD's graphics division will launch three new cards - Radeon HD 4850, 4870 and 4870 X2:
Source:
TG Daily
- ATI Radeon HD 4850 - 650MHz/850MHz/1140MHz core/shader/memory clock speeds, 20.8 GTexel/s (32 TMU x 0.65 GHz) fill-rate, available in 256MB/512MB of GDDR3 memory or 512MB of GDDR5 memory clocked at 1.73GHz
- ATI Radeon HD 4870 - 850MHz/1050MHz/1940MHz core/shader/memory clock speeds, 27.2 GTexel/s (32 TMU x 0.85 GHz) fill-rate, available in 1GB GDDR5 version only
- ATI Radeon HD 4870 X2 - unknown core/shader clock speeds, available with 2048MB of GDDR5 memory clocked at 1730MHz
278 Comments on ATI Radeon HD 4800 Series Video Cards Specs Leaked
Example: G80 GTX has 128 SP. Imagine you want to calculate vertex data, vertex are represented by x, y and z coordinates and each one is a floating point variable. We are going to say vertex1 is V1(x1, y1, z1), vertex2 is V2(x2, y2, z2)... vertexn Vn(xn, yn, zn) ,In the SP core (of 128), each dimesion can be calculated in 1 ALU which belongs to 1 SP. (there's controversy here as Nvidia said each SP is capable of 2 per clock per SP, but it seems it can't)
It works like that:
clock cycle 1 : sp1 runs x1 - sp2 runs y1 - sp3 z1 - sp4 x2 - sp5 y2 - ... - sp127 x44 - sp128 y44 <<< as you can see V44 is not finalized yet, but it doesn't matter because:
clock cycle 2 : sp1 z44 - sp2 x45 - ...
And so on. Imagine we have a core with 64 SPs running at 2x the speed. The result, the throughoutput (GFlops) is exacly the same and thus the code is going to be calculated as fast. Same if we have 256 SPs running at half the speed. There won't be any spare SP at any time, unless:
A: It can't fetch enough data from memory pool, the frame buffer, whatever the reason there is for this: other units are slow, not enough data sent by the CPU...
B: The Unit that has to continue the work i.e the ROPs can't keep up and have ordered to not continue with the work as the frame buffer is full of unprocessed data.
You can mix data types in the above example too, as long as they don't belong to the same cluster (I think). G80 and G92 have clusters of 16 SP, GTX and G92 GTS have 8 (8x16=128), GT has 7 clusters. I don't think different data types are allowed within the same cluster, but I wouldn't bet a leg neither...
forums.techpowerup.com/showthread.php?p=794688#post794688
lets hope nvidia's releases get as much arguments.
:toast: