Friday, August 29th 2008
NVIDIA Could Ready HD 4670 Competitor
GPU Café published information on future competition lineups., which shows the entry of a "GeForce 9550 GT" stacked up against the Radeon HD 4670. Sources in the media have pointed to the the possibility that the the RV730 based HD 4670 from ATI outperforms NVIDIA cards in its current lineup, relative to the segments where GeForce 9500 GT sits. The HD 4650 could exchange a few blows with the GeForce 9500 GT with equal or better levels of performance while the HD 4670 surpasses it.
The entry of a GeForce 9550 GT shows the 9500 GT cannot compete with the HD 4650, a newer price demographic of ~ $129 is shown in that chart that not only indicates prices, but also shows the HD 4650's lead over 9500 GT is so significant that ATI could be comfortable with asking you $20 more than what 9500 GT asks, relative to the range. GPU Café reports that the 9550 GT would be a toned-down (and shrunk) G94, as in the 55 nm G94b, featuring 64 shader processors and a 192-bit memory bus (and presumably, memory configurations such as 384 MB or 768 MB of GDDR3 memory).
Source:
GPU Café
The entry of a GeForce 9550 GT shows the 9500 GT cannot compete with the HD 4650, a newer price demographic of ~ $129 is shown in that chart that not only indicates prices, but also shows the HD 4650's lead over 9500 GT is so significant that ATI could be comfortable with asking you $20 more than what 9500 GT asks, relative to the range. GPU Café reports that the 9550 GT would be a toned-down (and shrunk) G94, as in the 55 nm G94b, featuring 64 shader processors and a 192-bit memory bus (and presumably, memory configurations such as 384 MB or 768 MB of GDDR3 memory).
58 Comments on NVIDIA Could Ready HD 4670 Competitor
I think nVidia is just going overboard with adding the 9550GT. They should have just left the 9500GT for $20 cheaper and let their partners Pre-Overclock the cards to make up the difference in performance and price.
I am in agreement with you though, the 9600GSO can be had for $90 even with free shipping right now from newegg. So, IMO, these lower class cards aren't worth saving the $10-20. The 9600GSO is even cheaper if you consider rebates, they can be had for $80.
With the 8 Series:
8400, 8500, 8600, 8800
With the 7 series:
7100/7200, 7300, 7600, 7800/7900
With the 6 series:
6200-TC, 6200, 6600, 6800
With the 5 series:
5200, 5500, 5600/5700, 5800/5900
Though, in todays market, I don't see a place for the extreme low end anymore.
For Starters the 5500 came out after the 5200 to replace the 5200 Ultra that was more expensive to produce.
The 6200TC is the same low end generation as the normal 6200, the normal PCIe 6200 was just so they had something there, the 6200TC replaced it
The 7100/7200 line granted where lower end, though the 7100GS was faster than the 7200GS, and the 7200GS was just to get rid of NV44 cores. Though it started here.
Personally I want it simple again.
Geforce MX for low end
Geforce TI for high end.
GeforceMX 220, GeforceMX 240, GeforceMX 260
GeforceTI 220, GeforceTI 240, GeforceTI 260,
that would simpliy life enough for me
The 9500 is basically a higher clocked 8600.
The HD4600 is basically a HD3870 with a 128bit bus (+faster AA unit).
Considering that these cards are mainly used by users who own 19" monitors (~1280x1024), the low memory bandwidth won't be a major criteria.
For refrence, a HD3850 is around 2x faster than a 8600GTS @ 1280x1024.
The HD4670 will have 480GFlops (peak) and 9500GT has around 132GFlops (peak - depending on the model). You can't close that gap with an overclock.
The HD4670 is just an overclocked HD4650. All the information we have seen says the 9500GT matches the HD4650, so an overclocked 9500GT should be able to match an HD4670.
And the FLOPS rating of either card doesn't matter one bit, and has no real affect on graphical performance. If it did, we wouldn't see the 9600GT rated at 208 GFLOPS outperforming the HD3870 rated at 496 GFLOPS.
So 64x750 = 48,000
now the 9500GT 32x16x8 core config, now the shader config is 32 and only 32 but all those will be used, unlike the extra ALU's on the 4650.
so 32x1400 = 44,800
numbers are fairly close on shader op's per second for most games actully.
so you tell me can the 9500GT keep up?
what ATI needs is a 9600GT killer the RV670 is supposed to stop production soon leaving nothing to compete
ATI and nVidia use a runtime compiler and this compiler tries to make best use of the shaders available. I don't think there is any situation where the compiler is that inefficient.
@newtekie1 about the 9600GT vs HD3870:
While GFlops are NOT the only factor that make a chip perform in a certain way, they are for sure very important. It just crazy to say they don't matter one bit.
Only using the 9600GT vs HD3870 as refrence and concluding that is wrong.
The problem lies in the fact that the HD3870 has very high shader power while the other units are not that powerful. That's why you get a skewed view when using 'older' games.
To give an example, check out these numbers from Crysis - 'very high setting' (extremely shader heavy):
8600GTS - 4.3
HD3650 - 6.4
9600GT - 14.9
HD3870 - 16.1
9800GTX - 21.9
You can immediately see that the HD3870 is faster that the 9600GT but even more important is the fact that the 9800GTX is 47% faster. GFlops don't matter? They matter now and even more in the future.
www.techpowerup.com/reviews/Galaxy/GeForce_9500_GT_Overclocked/9.html
The HD3650 doesn't outperform the 8600GTS in Crysis despite the nearly 100 GFLOP advantage the HD3650 has. Face it, GFLOPS can't be used to determin gaming performance. We will have to wait until the HD4650 is released and see. However, judging by the performance of the HD3650, which is about 60% of the 9500GT, and the fact that the HD4650 appears to be the HD3650 with everything on the core double, I think the two will be very close in the end.
Well i also should have mentioned that since ATI and nVidia use completely different architectures, it's hard to compare their GFlops. But within one brand it's easy to see that GFlops do matter and that's why i was pointing to the 9600GT and 9800GTX comparison.
Your techpowerup review of Crysis has one flaw:
We tested the DX9 version with graphics set to "High", which is the highest non-DX10 setting in the game.
ComputerBase uses DX10 and 'Very High'. This setting is much more shader demanding!
BTW in this same review the 9500GT scores 7.0fps and that's only 9% more than a HD3650.
If you go here:
www.computerbase.de/artikel/hardware/grafikkarten/2008/test_ati_radeon_hd_4870_x2/23/#abschnitt_performancerating => these are the results of all games combined.
Here you see that the 9500GT scores only 19% more on average than a HD3650.
Enough talking and let's just wait a month.
If the card has some shader power left now (assuming that is true, which I don't think) then the card is bottlenecked by the other parts. That will not change in the future and that only means that while the 9500GT will go down to 5fps from the 10fps that renders today, the HD card will mantain a framerate close to that 10. WOOhhooo! Big deal. Same happens with the X1000 family, now they are like 50%++ faster than GF7 counterparts but always on higher settings and thus unplayable frames.
I have said this like hundreds of times: ever since the X1000 series Ati seems more concerned about how the cards could perform in the future than making the better card they can for the present.
ATI: 5 units can do MADD (or ADD or MUL)
The 5th (and complex) unit is a special unit. It can also do transcedentals like SIN, COS, LOG, EXP. That's it.
1 MADD (=Multiply-Add) = 2 Flops
1 ADD or MUL = 1 Flops
And these are all usable. The developer doesn't need to program this. The compiler takes care of this. A real life scenario with some bad code could be something like 2 MADD + 1 MUL. If we average this over the 64 units then that would give 240GFlops.
nVidia: basically each scalar unit can do 2 Flops per clock. That would result in a real life performance of around 90GFlops.
So on shader performance ATI will win hands down.
Considering how close the HD4870 performs to the GTX 280 and how much more texel fillrate and bandwidth the GTX has, then it seems to me that shader performance is darn important these days.
- Theoretically both Ati and Nvidia shaders can do MADD+MUL. What you quoted above was about the G80, it has been long fixed in later releases. Assuming Ati can do both at a ime, while Nvidia can't, is stupid consideing Ati doesn't outperform Nvidia by so much even on shader specific benchmarks...
-You so conveniently forgot Nvidia shaders run at doeble the speed when calculating the "real life" performance...
- R600 and R700 are SIMD for each cluster and VLIW for each shader. This means that the instruction for all 5 units in the shader have to be written at the same time in the compilation (Very Long Instruction Word) and that all 80 shaders (R600=80x4, R700=80x10) in each cluster must calculate the same instruction. By constrast Nvidia's are scalar and also organiced on SIMD arrays, but only 16 or 24 long. (G80/9x and GT200 respectively)
This has two effects:
1. VLIW means that even if shaders (5 ALUs) are superscalar for the programmer or the drivers in this case, each shader IS a vector unit.
2. SIMD over such large arrays means that if a state change occurs, you have to calculate it in a different cluster, potentially losing a complete cluster or even the entire chip in tha clock.
That's why Ati is comparable to Nvidia when it comes to "real life" shader power.
And the image is not from a slide.
- Only GT200 can dual-issue MADD and MUL ops all the time. G8x/G9x generation chips can't do it all the time. There are a select few scenarios where you can dual-issue MAD and MUL ops.
- I didn't: 1375Mhz * 2 Flops * 32 shaders = 88 GFlops
- You are wrong about it being SIMD. ATI's shader involves a MIMD 5-way vectoriel unit, MIMD signifying (contrary to SIMD) that several different instructions can be processed in parallel. The compiler is going to try to assemble simple operations in order to fill the MIMD 5D unit. But these 5 instructions cannot be dependant on each other. So even one shader can process different instructions at a time, let alone one cluster!
I simulated that only 3 instructions/shader can be done on average in my real life calculation because of less than optimal code and inefficiencies.
So basically your conclusion is wrong!
Using my real life caculation (it's just a simulation):
9800GTX 432GFlops
HD3870 248
HD4670 240
9600GT 208
9500GT 88
HD3650 87
If you check out my Crysis scores i posted previously, things start to make sence.
Now i know the HD4670 won't beat the 9600GT in Crysis because of many factors but what ATI has done is basically slapped the HD3870 shader engine into it. Add the RV700 generation architectural improvements.
nVidia on the contrary has made a die shrink of G84 and clocked it higher.
(pls read my previous posts before you reply)
www.techreport.com/articles.x/12458/2
www.techreport.com/articles.x/14990/4 And then still remains the question whether the drivers can take the usually linear code of games (linear in the sense that AFAIK they calculate different data types at a different time, instead of everything being calculated concurrently) and effectively blend different types of instructions in one VLIW instructions in real time. "Real time" being the key. R600/700 was developed with GPGPU in mind and there it can be effectively used. The inclusion of VLIW then makes sense. But IMO that is fundamentally impossible for the most part in real time calculations. Probably if shaders are doing vertex calculations the other 2 ALUs remain unused, even worse if the operation requires less ALUs.
On the MADD+MUL you are probably right, but Nvidia DID claim they had fixed it on the 9 series.
88 GFlops: I thought you were talking about the 9600GT, for some reason. Probably because candle mentioned it. But TBH arguing about the shader power to compare the graphics cards performance is pointless. The card could be capable of 10 TFlops, but if it mantained only the same 8 render back-ends, it would still perform similarly to any other card with 8 ROPs and similar clocks.
Ah oh, about Crysis. Nonsense. HD3870 is not faster than 9600 GT, let alone a massively crippled one. (if you insist in comparing the HD3870 with the HD4670)