DarkMatter
New Member
- Joined
- Oct 5, 2007
- Messages
- 1,714 (0.27/day)
Processor | Intel C2Q Q6600 @ Stock (for now) |
---|---|
Motherboard | Asus P5Q-E |
Cooling | Proc: Scythe Mine, Graphics: Zalman VF900 Cu |
Memory | 4 GB (2x2GB) DDR2 Corsair Dominator 1066Mhz 5-5-5-15 |
Video Card(s) | GigaByte 8800GT Stock Clocks: 700Mhz Core, 1700 Shader, 1940 Memory |
Storage | 74 GB WD Raptor 10000rpm, 2x250 GB Seagate Raid 0 |
Display(s) | HP p1130, 21" Trinitron |
Case | Antec p180 |
Audio Device(s) | Creative X-Fi PLatinum |
Power Supply | 700W FSP Group 85% Efficiency |
Software | Windows XP |
@Darkmatter
- Only GT200 can dual-issue MADD and MUL ops all the time. G8x/G9x generation chips can't do it all the time. There are a select few scenarios where you can dual-issue MAD and MUL ops.
- I didn't: 1375Mhz * 2 Flops * 32 shaders = 88 GFlops
- You are wrong about it being SIMD. ATI's shader involves a MIMD 5-way vectoriel unit, MIMD signifying (contrary to SIMD) that several different instructions can be processed in parallel. The compiler is going to try to assemble simple operations in order to fill the MIMD 5D unit. But these 5 instructions cannot be dependant on each other. So even one shader can process different instructions at a time, let alone one cluster!
I simulated that only 3 instructions/shader can be done on average in my real life calculation because of less than optimal code and inefficiencies.
So basically your conclusion is wrong!
Using my real life caculation (it's just a simulation):
9800GTX 432GFlops
HD3870 248
HD4670 240
9600GT 208
9500GT 88
HD3650 87
If you check out my Crysis scores i posted previously, things start to make sence.
Now i know the HD4670 won't beat the 9600GT in Crysis because of many factors but what ATI has done is basically slapped the HD3870 shader engine into it. Add the RV700 generation architectural improvements.
nVidia on the contrary has made a die shrink of G84 and clocked it higher.
(pls read my previous posts before you reply)
Sorry, but you are wrong. Well, in some way you could say it's MIMD, because R600/700 is composed of SIMD arrays of 5 wide superscalar shader processors controled through VLIWs. BUT the MULTIPLE instruction part is INSIDE each shader, meaning that each ALU within the shader can process different instructions, BUT the every SP in the SIMD array has to share the same instruction. My claim still remains true.
http://www.techreport.com/articles.x/12458/2
http://www.techreport.com/articles.x/14990/4
These stream processor blocks are arranged in arrays of 16 on the chip, for a SIMD (single instruction multiple data) arrangement, and are controlled via VLIW (very long instruction word) commands. At a basic level, that means as many as six instructions, five math and one for the branch unit, are grouped into a single instruction word. This one instruction word then controls all 16 execution blocks, which operate in parallel on similar data, be it pixels, vertices, or what have you.
And then still remains the question whether the drivers can take the usually linear code of games (linear in the sense that AFAIK they calculate different data types at a different time, instead of everything being calculated concurrently) and effectively blend different types of instructions in one VLIW instructions in real time. "Real time" being the key. R600/700 was developed with GPGPU in mind and there it can be effectively used. The inclusion of VLIW then makes sense. But IMO that is fundamentally impossible for the most part in real time calculations. Probably if shaders are doing vertex calculations the other 2 ALUs remain unused, even worse if the operation requires less ALUs.
On the MADD+MUL you are probably right, but Nvidia DID claim they had fixed it on the 9 series.
88 GFlops: I thought you were talking about the 9600GT, for some reason. Probably because candle mentioned it. But TBH arguing about the shader power to compare the graphics cards performance is pointless. The card could be capable of 10 TFlops, but if it mantained only the same 8 render back-ends, it would still perform similarly to any other card with 8 ROPs and similar clocks.
Ah oh, about Crysis. Nonsense. HD3870 is not faster than 9600 GT, let alone a massively crippled one. (if you insist in comparing the HD3870 with the HD4670)
Last edited: