Benetanegia
New Member
- Joined
- Sep 11, 2009
- Messages
- 2,680 (0.48/day)
- Location
- Reaching your left retina.
Ok now I see what you mean. So basically if I got what you're saying, the built in scheduler sucks and does some of the calculations multiple times, thus wasting sp cycles?
NO! Not at all. There's nothing wrong on the scheduler and I never suggested anything similar. But the more parallel an architecture is, the more innefficient it is. That's something inherent to that kind of architecture, but there's nothing wrong there. But you will never reach the same efficiency (efficiency as actual perf/raw perf) of a less parallel architecture, that's why you can't compare AMD's flops with Nvidia's flops.
Regarding AMD cards doing the same calculations multiple times, bear in mind it's just my own speculation and I wasn't saying that in a bad way, it's just my view of how it probably is and a way to explain how AMD use twice as much flops to do the same thing. Often times something is calculated for a shader that is going to be used later, in this case is usual to store it either on the cache or vram (imagine some lighting data that is going to be used as anthe input for HDR, bloom, color correction and whatever filter). My idea is that sometimes (especially if that thing has to be moved to vram) in a chip like AMD's it could be better/faster to calculate some of those things when they are required again, storing it in the SP's registers or L1 cahe, just to use them in the next ready clock cycle, instead of reading the old result stored in L2/vram, because those memories will use more cycles and AMD has spare SPs most of the times anyway. Its architecture favours that kind of brute force programing, while Nvidia's architecture, with its bigger and faster registers and caches, favors the other method. That's not to say Nvdia's method is better as both architecures have been trading blows depending on the generation. Ati's method lets them pack more raw Gflops in the same die area or transistor budget, but they can't be used as efficiently and Nvidia's are efficient, but take more space. As I said, looking back at the performance to transistor ratio, both have been pretty close. Take into account that although G92 and GT200 had more transistors than RV670 and 770, the competing products GTX260 and 8800GT had many clusters disabled. If Nvidia had made them that size i.e 216 SPs/28 ROPs instead of 240/32 it would have been pretty much the same size as RV770.