Thursday, April 24th 2008

ATI Radeon HD 4800 Series Video Cards Specs Leaked

Thanks to TG Daily we can now talk about the very soon to be released ATI HD 4800 series of graphics cards with more details. One week ahead of its presumable release date, general specifications of the new cards have been revealed. All Radeon 4800 graphics will use the 55nm TSMC produced RV770 GPU, that include over 800 million transistors, 480 stream processors or shader units (96+384), 32 texture units, 16 ROPs, a 256-bit memory controller (512-bit for the Radeon 4870 X2) and native GDDR3/4/5 support as reported before. At first, AMD's graphics division will launch three new cards - Radeon HD 4850, 4870 and 4870 X2:
  • ATI Radeon HD 4850 - 650MHz/850MHz/1140MHz core/shader/memory clock speeds, 20.8 GTexel/s (32 TMU x 0.65 GHz) fill-rate, available in 256MB/512MB of GDDR3 memory or 512MB of GDDR5 memory clocked at 1.73GHz
  • ATI Radeon HD 4870 - 850MHz/1050MHz/1940MHz core/shader/memory clock speeds, 27.2 GTexel/s (32 TMU x 0.85 GHz) fill-rate, available in 1GB GDDR5 version only
  • ATI Radeon HD 4870 X2 - unknown core/shader clock speeds, available with 2048MB of GDDR5 memory clocked at 1730MHz
The 4850 256MB GDDR3 version will arrive as the successor of the 3850 256MB with a price in the sub-$200 range. The 4850 512MB GDDR3 should retail for $229, while the 4850 512MB GDDR5 will set you back about $249-269. The 1GB GDDR5 powered 4870 will retail between $329-349. The flagship Radeon HD 4870 X2 will ship later this year for $499.
Source: TG Daily
Add your own comment

278 Comments on ATI Radeon HD 4800 Series Video Cards Specs Leaked

#276
DarkMatter
lemonadesodaTraditional

Unified Shader


If you had a "screen render" that fitted into the existing pipeline "4 cycles", single pass for each cycle in the rendering stage... as shown in the diagram, then increasing the number of shaders doesnt change anything. The spare-capacity doesnt help. A low FSAA, AA, 1280x1024 can "fit in" the "4 cycle" path, single pass for each stage.

If you have a scene that is 1920x1200 with 16x, 16x, then a screen render will require more than one pass through each stage.

In instance A, clock speed will get you faster FPS. Shaders doesnt help much.

In instance B, increasing the shaders means more can be done in each pass, meaning fewer passes, ultimately getting to just one single pass through each stage. Here, gains are from increased shaders in addition to increased clocks.

That's how I've always understood it. If there is a fallacy with the logic... let me know.
No, no, no... you understood it wrong. In your image, where it says shader core, it's not 1 shader processor, it's the entire shader array. The next stage can be calculated in any available ALU within the core. To explain this simply I will use G80 as an example, since it's SPs are fully scalar. R600 is more complicated because it needs some pre-arrangement, but it works equally in the sense of that next stage of the same fragment or a next fragment within the same stage can be calculated in the next available unit. The latter just means you can do A -> B -> C -> D or calculate several pixels in A stage together and then continue. The latter is how they work nowadays.

Example: G80 GTX has 128 SP. Imagine you want to calculate vertex data, vertex are represented by x, y and z coordinates and each one is a floating point variable. We are going to say vertex1 is V1(x1, y1, z1), vertex2 is V2(x2, y2, z2)... vertexn Vn(xn, yn, zn) ,In the SP core (of 128), each dimesion can be calculated in 1 ALU which belongs to 1 SP. (there's controversy here as Nvidia said each SP is capable of 2 per clock per SP, but it seems it can't)

It works like that:

clock cycle 1 : sp1 runs x1 - sp2 runs y1 - sp3 z1 - sp4 x2 - sp5 y2 - ... - sp127 x44 - sp128 y44 <<< as you can see V44 is not finalized yet, but it doesn't matter because:

clock cycle 2 : sp1 z44 - sp2 x45 - ...

And so on. Imagine we have a core with 64 SPs running at 2x the speed. The result, the throughoutput (GFlops) is exacly the same and thus the code is going to be calculated as fast. Same if we have 256 SPs running at half the speed. There won't be any spare SP at any time, unless:

A: It can't fetch enough data from memory pool, the frame buffer, whatever the reason there is for this: other units are slow, not enough data sent by the CPU...

B: The Unit that has to continue the work i.e the ROPs can't keep up and have ordered to not continue with the work as the frame buffer is full of unprocessed data.

You can mix data types in the above example too, as long as they don't belong to the same cluster (I think). G80 and G92 have clusters of 16 SP, GTX and G92 GTS have 8 (8x16=128), GT has 7 clusters. I don't think different data types are allowed within the same cluster, but I wouldn't bet a leg neither...
Posted on Reply
#278
HAL7000
And to think after all is said and done ........we still need to wait and see. Good conversation on everyone's part. A post of the good , the bad and the ugly....lol.

lets hope nvidia's releases get as much arguments.

:toast:
Posted on Reply
Add your own comment
Nov 25th, 2024 21:22 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts