• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GeForce 4XX Series Discussion

Status
Not open for further replies.
What does that mean to the uninitiated like myself?

First of all, it means that it complies with the latests IEEE precision standards. That's good for scientific applications.

In games will probably mean nothing I guess, or not too much. In CUDA/OpenCL/DX11 Compute it means that it can do 64 bit operations just at 1/2 the speed it does 32 bit. Previous generation did that at 1/10 the speed. HD58xx does it at 1/5 the speed.
 
Last edited:
ok listen i will be around when yall have problems with anyboard new or old and try to help out so good luck
 
http://www.nvidia.com//content/flash/SWFs/Fermi/Jen-Hsun_FermiLaunch_640x360.swf
And Fermi pics
scheda-video-gt300-3_c.jpg

scheda-video-gt300-2_c.jpg
 
Sadly, ATI lied about how well their card did, saying it (5xxx series) did 200%+ better than the GTX295... well that was not true. So when benchmarks come out for the GT3xx, don't panic when it says 3000%+ better than the 5xxx.
 
Sadly, ATI lied about how well their card did, saying it (5xxx series) did 200%+ better than the GTX295... well that was not true. So when benchmarks come out for the GT3xx, don't panic when it says 3000%+ better than the 5xxx.

i know but do you see how sexy it looks?! chrome!!
 
gfx cards could look like a turd for all i care,its about performance :)
 
The GPU has been announced, the cards have been shown, all the GPGPU features were thoroughly discussed...

..and yet no performance numbers for gaming were even remotely mentioned. This doesn't bode well for the gamers' expectations.

This card seems like an obvious competitor to Larrabee and not RV870.. But isn't nVidia forgetting where the real money is?


I mean, is there a market profitable enough to live off of overly expensive computing cards, or is nVidia going to turn into a Matrox (graphics cards wise)?
 
The GPU has been announced, the cards have been shown, all the GPGPU features were thoroughly discussed...

..and yet no performance numbers for gaming were even remotely mentioned. This doesn't bode well for the gamers' expectations.

This card seems like an obvious competitor to Larrabee and not RV870.. But isn't nVidia forgetting where the real money is?


I mean, is there a market profitable enough to live off of overly expensive computing cards, or is nVidia going to turn into a Matrox (graphics cards wise)?

Exactly what i was thinking....Is this actually their gaming Card or is this something different? Im not sure if this card is meant for gamers. Do they have another card sitting back just for gaming? Is this card just supposed to be better than Cypress in processing power?
 
Exactly what i was thinking....Is this actually their gaming Card or is this something different? Im not sure if this card is meant for gamers. Do they have another card sitting back just for gaming? Is this card just supposed to be better than Cypress in processing power?

With 512 SPs compared to 240 of the previous generation and all the innovations, it is supposed to be better. Entering into suppositions is risky, but think that AMD doubled up almost everything, Nvidia more than doubled up most of the things (2.15X indeed). HD5xxx cards, performance wise, have not delivered 2x the performance and we don't know if Nvidia cards will. Anyhow if both "fail" to deliver 2x the performance the same ammount, Nvidia still has that 0.15% extra. It's not much but it would mean that it would widen up the difference a little bit compared to previous generation.
 
Exactly what i was thinking....Is this actually their gaming Card or is this something different? Im not sure if this card is meant for gamers. Do they have another card sitting back just for gaming? Is this card just supposed to be better than Cypress in processing power?



I don't think they would have the resources to design two distinct state-of-the-art GPUs, for different markets.

Besides, nVidia is trying to convince gamers (responsible for 95% of its income) that they need more computing power than their CPUs have to offer.. which they don't, at all.


I do believe that the gaming cards will be somewhat different (6GB for a gaming GPU would make as much sense as those 1GB Geforce 9400). They'll have at least 2 video outputs.




Honest opinion? It's a dead end for nVidia.
They should have bought/merged with VIA when they had the chance, instead of trying to change the world all by themselves.
Now they're the only player without a complete CPU+GPU system for x86, which makes them a non-player in the long run.

Good thing they've invested in the ARM business, though. At least they have somewhere to run when things get ugly in the x86 market.
 
With 512 SPs compared to 240 of the previous generation and all the innovations, it is supposed to be better. Entering into suppositions is risky, but think that AMD doubled up almost everything, Nvidia more than doubled up most of the things (2.15X indeed). HD5xxx cards, performance wise, have not delivered 2x the performance and we don't know if Nvidia cards will. Anyhow if both "fail" to deliver 2x the performance the same ammount, Nvidia still has that 0.15% extra. It's not much but it would mean that it would widen up the difference a little bit compared to previous generation.

The RV870 didn't double the memory bandwidth, which could become a real problem when the GF100 comes out.

Even more with the stupidly high resolutions that the Eyefinity setups allow for gamers.


Furthermore, nVidia has actually decreased some functionalities in the GF100 in relation to the GT200, in order to get better performance/transistor ratio.
 
The RV870 didn't double the memory bandwidth, which could become a real problem when the GF100 comes out.

Even more with the stupidly high resolutions that the Eyefinity setups allow for gamers.


Furthermore, nVidia has actually decreased some functionalities in the GF100 in relation to the GT200, in order to get better performance/transistor ratio.

It has not decreased anything at all. Everything has been increased. And what it is more important IMO is that the scheduling and threading capabilities have been improved a lot, which is IMO one of the problems in RV870, apart from the memory bandwidth, if that is a problem at all.

Nowhere has been said that Ati improved the ability of their scheduling or dispatching engine for RV870. So it could be that they have twice the SPs, but they don't have enough power to have them fed all the time. According to Real World Technologies that was a problem in GT200.

Nvidia has astonisingly increased that in GT300. From the ability to run 16 different kernels to the 10x faster context switching, passing through the fact that they doubled up the schedulers present in each SP cluster and made the entire cluster homogeneous unlike in GT200 where they were 3 groups of 8, finishing with MIMD nature of those. All that makes GT300 much more efficient in the use of it's resources. There's no way we can now how that will affect performance, but it definatelly won't make it worse.
 
It has not decreased anything at all. Everything has been increased. And what it is more important IMO is that the scheduling and threading capabilities have been improved a lot, which is IMO one of the problems in RV870, apart from the memory bandwidth, if that is a problem at all.

Nowhere has been said that Ati improved the ability of their scheduling or dispatching engine for RV870. So it could be that they have twice the SPs, but they don't have enough power to have them fed all the time. According to Real World Technologies that was a problem in GT200.

Nvidia has astonisingly increased that in GT300. From the ability to run 16 different kernels to the 10x faster context switching, passing through the fact that they doubled up the schedulers present in each SP cluster and made the entire cluster homogeneous unlike in GT200 where they were 3 groups of 8, finishing with MIMD nature of those. All that makes GT300 much more efficient in the use of it's resources. There's no way we can now how that will affect performance, but it definatelly won't make it worse.

+1 and thank you for posting here.

One thing to remember about memory bandwidth is that a 256 bit and 512 bit bus are ridiculously different. I've seen some posts above where they talk about there being a memory bandwidth bottleneck, so I'm just adding some bits of information on the subject. The width of the bus is like the lanes of a highway, and how fast the cars move is the ram speed. Fun thing is that your GPU memory is not the only memory that accesses the GPU's memory bus! Your PC has to send instructions to the GPU so that means PC instructions are also carried along this bus, but those are really freaking fast... just remember that it's faster to have more cars simultaniously reach their destination and put them into buffer than for less cars to come and move into buffer at their destination.
 
Last edited:
Really smells expensive. Must be another "league of its own" nvidia card.
 
It has not decreased anything at all. Everything has been increased. (...)

Not everything. Quoting myself, nVidia did decrease some things in order to increase the overall chip efficiency.
One of them would be decreasing the memory bus from 512 to 384bit, which is compensated by using GDDR5.

And another example would be the limit of simultaneous threads, which was 30720 for the GT200 and now is down to 24576 for the GF100.



Trimming down some functions between generations doesn't necessarily mean that it'll be worse. Most of the time it's just a matter of optimization, saving those precious transistors for something else.


And what it is more important IMO is that the scheduling and threading capabilities have been improved a lot, which is IMO one of the problems in RV870, apart from the memory bandwidth, if that is a problem at all.


Scheduling and threading is a lot less efficient with ATI's DX10/11 approach, yes. However, that's compensated by the sheer ammount of ALUs.
And although this sounds like a less elegant solution, the performance/transistor ratio was proven to be a lot better in the red team.


The memory bandwidth bottleneck in the HD5870 is easy to prove. Take the comparison results with the HD4870X2, for example:
Even if we imagine a 100% scalability with the two RV770s, the HD5870 is higher clocked, with the same functional units as the HD4870X2 (ROPs, TMUs, shader processors, etc).
However, the HD4870X2 beats the HD5870 in many situations.
Specs-wise, the HD4870X2 only has higher theoretical bandwidth, so that's the only possible case.
 
Not everything. Quoting myself, nVidia did decrease some things in order to increase the overall chip efficiency.
One of them would be decreasing the memory bus from 512 to 384bit, which is compensated by using GDDR5.
Ok the memory bus decreased, but bandwidth increased. Normally, when you talk about specs, you have to take the practical ones, bandwidth is and bus width is not. And what's more important, bandwidth is much higher in GT300 than in RV870, so it doesn't matter if RV870 is bottlenecked, which I think it's not.

GT200 -> 512bit/8 x 2500 = 160 GB/s
GT300 -> 384bit/8 x 4800 = 230 GB/s
RV770 -> 256bit/8 x 3600 = 115 GB/s
RV790 -> 256bit/8 x 3900 = 124 GB/s
RV870 -> 256bit/8 x 4800 = 153 GB/s

As you see even the GTX285 had higher memory bandwidth, but that doesn't mean the GTX was bottlenecked, not at all.
Also the jump from RV770 to RV870 is 153/115 = 1.33 or 33% more bandwidth. Or RV790/RV870 is 153/124 = 1.23 or 23%.
While the jump in Nvidia is 230/160= 1,43 or 44% improvement. Nothing points out to GT300 being memory bottlenecked, far from it.

And another example would be the limit of simultaneous threads, which was 30720 for the GT200 and now is down to 24576 for the GF100.

Yeah, I knew about that decrease, but it's important to note that GT200 never ever reached anything close to that number, and I mean 24576, and GT300 can. Peak <insert spec> means very little unless you are speaking about the exact same architecture. It's like RV770 had 1.2 TFlops and GT200 only had 622 Gflops or 933 with dual issue (which almost never happens), but GT200 has the ability to use them much better and it's faster.

Scheduling and threading is a lot less efficient with ATI's DX10/11 approach, yes. However, that's compensated by the sheer ammount of ALUs.
And although this sounds like a less elegant solution, the performance/transistor ratio was proven to be a lot better in the red team.

Not true. http://forums.techpowerup.com/showpost.php?p=1575651&postcount=188

We don't know the clocks. The limitations they encountered in clocking GT200 higher could not happen now. GT200 was 65nm and RV770 was 55nm. Now both are 40nm so they can achieve clocks that are similar, they might not, but the posibility is higher than in previous generation.

The truth is that at least since DX10 cards the performance/transistors ratio has been constant in almost every chip. If you add G92 and RV670 to the equations that I made in the link, it becomes even more apparent, g92 with 756 millions more than competes with the low end of RV770. And the 667 million trans. RV670 does so with G92.

The memory bandwidth bottleneck in the HD5870 is easy to prove. Take the comparison results with the HD4870X2, for example:
Even if we imagine a 100% scalability with the two RV770s, the HD5870 is higher clocked, with the same functional units as the HD4870X2 (ROPs, TMUs, shader processors, etc).
However, the HD4870X2 beats the HD5870 in many situations.
Specs-wise, the HD4870X2 only has higher theoretical bandwidth, so that's the only possible case.

The X2 also has two schedulers, one per chip, so that can be the problem as I said.

The only way in which memory bottleneck can be proven in RV870 is taking the HD5870 and downclocking the chip while leaving the memory as is. If you downclock and performance is mantained then the card was bottlenecked and if it does perform worse then it's not.

Anyway specs don't tell all the story. My latest assumptions are not based in specs (only), they are based in all the aspects of the chip covered in the white paper and architecture previews like the one in Real World Technologies. GT300 has improved in almost every practical aspect and RV870 is basically the same chip, with twice the units and DX11 suport. Just because the latter has not scaled well, doesn't mean the former will not scale well. For instance, Nvidia has scaled much better in the past. They went from 128 SPs to 240SP, 1.875x the ammount. At the same time AMD went from 320SP to 800SP, 2.5x. None of them reached that amount of improvement but Nvidia got much closer. Looking at it that way, it doesn't surprise at all that RV870 didn't scale that well, RV770 neither did after all. And now Nvidia is doing a 2.15x increase, so chances are big for them to do much better.
 
Last edited:
I think it needs to be said that doubling or even slightly more than doubling the shader count hasn't been enough to beat the dual cards of the prior gens. The 280 GTX traded blows with the 9800 GX2, didn't beat it out right. The 5870 trades blows with 4870 X2. Odds are the 380 GTX will not outright beat the 295 GTX, which means it might not be quite as far ahead of the 5870 as some are thinking...

I'm thinking now that a 5870 may even break even at 1035MHz (see news page) with a 295/380, which may be relevant when all the cards are on the field and prices settle.
 
Not true. http://forums.techpowerup.com/showpost.php?p=1575651&postcount=188

We don't know the clocks. The limitations they encountered in clocking GT200 higher could not happen now. GT200 was 65nm and RV770 was 55nm. Now both are 40nm so they can achieve clocks that are similar, they might not, but the posibility is higher than in previous generation.
Quoting yourself suddenly makes your arguement true?
May be you should get some of your facts right before you post a statement.
The GTX 285 in your post is also 55nm, it doesn't mean it can clock at the same frequency as the 4890.
Also that 4870 in your post is 512MB while the GTX 280 and 285 are both 1024MB, the frame buffer makes a significant difference @2560x1600.

When a company create a SKU they have to consider the cost, heat output, and power consumtion of the product.
The GT200b if clocked @the same frequency of the 4890 = 850Mhz will result in poor heat output, and power efficiency.
The higher transistor count and architechure yields better performance per clock,
but at the same time it also increase the power consumtion and heat output significantly.
As a result, the higher transitor count also limits the ability the chip is able to clock.

May be you should compare the RV790 and GT200b which is both 55nm.
perfrel_1920.gif
 
Last edited:
Status
Not open for further replies.
Back
Top