• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

ATI Radeon HD 4800 Series Video Cards Specs Leaked

Joined
Aug 30, 2006
Messages
7,221 (1.08/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
Bump mapping has nothing to do with geometry. You are still connecting geometry to a T&L unit which doesn't exist anymore in modern GPU's. It's emulated on the shaders.

Please note the word "If" meaning that, under the situation you might be calling bump mapping geometry effects (which they are)... then all well and true. I did not SAY geometry=bump mapping.

As for the second statement I made, If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD, then YES, I withdraw that statement. It is wrong for Unified Shaders architecture DirectX10 Shader Model 4.0. It is only true for previous generation GPU.
 

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.27/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP
Traditional

Unified Shader


If you had a "screen render" that fitted into the existing pipeline "4 cycles", single pass for each cycle in the rendering stage... as shown in the diagram, then increasing the number of shaders doesnt change anything. The spare-capacity doesnt help. A low FSAA, AA, 1280x1024 can "fit in" the "4 cycle" path, single pass for each stage.

If you have a scene that is 1920x1200 with 16x, 16x, then a screen render will require more than one pass through each stage.

In instance A, clock speed will get you faster FPS. Shaders doesnt help much.

In instance B, increasing the shaders means more can be done in each pass, meaning fewer passes, ultimately getting to just one single pass through each stage. Here, gains are from increased shaders in addition to increased clocks.

That's how I've always understood it. If there is a fallacy with the logic... let me know.

No, no, no... you understood it wrong. In your image, where it says shader core, it's not 1 shader processor, it's the entire shader array. The next stage can be calculated in any available ALU within the core. To explain this simply I will use G80 as an example, since it's SPs are fully scalar. R600 is more complicated because it needs some pre-arrangement, but it works equally in the sense of that next stage of the same fragment or a next fragment within the same stage can be calculated in the next available unit. The latter just means you can do A -> B -> C -> D or calculate several pixels in A stage together and then continue. The latter is how they work nowadays.

Example: G80 GTX has 128 SP. Imagine you want to calculate vertex data, vertex are represented by x, y and z coordinates and each one is a floating point variable. We are going to say vertex1 is V1(x1, y1, z1), vertex2 is V2(x2, y2, z2)... vertexn Vn(xn, yn, zn) ,In the SP core (of 128), each dimesion can be calculated in 1 ALU which belongs to 1 SP. (there's controversy here as Nvidia said each SP is capable of 2 per clock per SP, but it seems it can't)

It works like that:

clock cycle 1 : sp1 runs x1 - sp2 runs y1 - sp3 z1 - sp4 x2 - sp5 y2 - ... - sp127 x44 - sp128 y44 <<< as you can see V44 is not finalized yet, but it doesn't matter because:

clock cycle 2 : sp1 z44 - sp2 x45 - ...

And so on. Imagine we have a core with 64 SPs running at 2x the speed. The result, the throughoutput (GFlops) is exacly the same and thus the code is going to be calculated as fast. Same if we have 256 SPs running at half the speed. There won't be any spare SP at any time, unless:

A: It can't fetch enough data from memory pool, the frame buffer, whatever the reason there is for this: other units are slow, not enough data sent by the CPU...

B: The Unit that has to continue the work i.e the ROPs can't keep up and have ordered to not continue with the work as the frame buffer is full of unprocessed data.

You can mix data types in the above example too, as long as they don't belong to the same cluster (I think). G80 and G92 have clusters of 16 SP, GTX and G92 GTS have 8 (8x16=128), GT has 7 clusters. I don't think different data types are allowed within the same cluster, but I wouldn't bet a leg neither...
 
Last edited:

DarkMatter

New Member
Joined
Oct 5, 2007
Messages
1,714 (0.27/day)
Processor Intel C2Q Q6600 @ Stock (for now)
Motherboard Asus P5Q-E
Cooling Proc: Scythe Mine, Graphics: Zalman VF900 Cu
Memory 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15
Video Card(s) GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory
Storage 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0
Display(s) HP p1130, 21" Trinitron
Case Antec p180
Audio Device(s) Creative X-Fi PLatinum
Power Supply 700W FSP Group 85% Efficiency
Software Windows XP

HAL7000

New Member
Joined
Jul 28, 2007
Messages
263 (0.04/day)
Location
Nashville TN
Processor AMD Athlon 64 X2 6000+ Windsor 3.0GHz
Motherboard ECS KA3 MVP
Cooling stock
Memory Mushkin eXtreme Performance 2GB (2 x 1GB)
Video Card(s) X1900GT
Storage SAMSUNG SpinPoint P Series SP2004C 200GB
Display(s) Acer AL2051W 20" 8ms DVI Widescreen LCD Monitor
Case IN WIN IW-F430.RL Red
Audio Device(s) Creative Audigy gamers edition
Power Supply PC Power & Cooling Silencer 750 Quad - Copper
Software XP Pro w/SP3
And to think after all is said and done ........we still need to wait and see. Good conversation on everyone's part. A post of the good , the bad and the ugly....lol.

lets hope nvidia's releases get as much arguments.

:toast:
 
Top