Friday, February 10th 2012
NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery
NVIDIA is bound to kickstart its competitive graphics processor lineup to AMD's Southern Islands Radeon HD 7000 series with GeForce Kepler 104 (GK104). We are learning through reliable sources that NVIDIA will implement a radically different design (by NVIDIA's standards anyway) for its CUDA core machinery, while retaining the basic hierarchy of components in its GPU similar to Fermi. The new design would ensure greater parallelism. The latest version of GK104's specifications looks like this:
SIMD Hierarchy
Source:
3DCenter.org
SIMD Hierarchy
- 4 Graphics Processing Clusters (GPC)
- 4 Streaming Multiprocessors (SM) per GPC = 16 SM
- 96 Stream Processors (SP) per SM = 1536 CUDA cores
- 8 Texture Units (TMU) per SM = 128 TMUs
- 32 Raster OPeration Units (ROPs)
- 256-bit wide GDDR5 memory interface
- 2048 MB (2 GB) memory amount standard
- 950 MHz core/CUDA core (no hot-clocks)
- 1250 MHz actual (5.00 GHz effective) memory, 160 GB/s memory bandwidth
- 2.9 TFLOP/s single-precision floating point compute power
- 486 GFLOP/s double-precision floating point compute power
- Estimated die-area 340mm²
139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery
Listen if NVIDIA fails with the 700 series I take full responsibility. Its my fault for going green.
@ thread
What I mention to arnoo1 is normal and has happened on pretty much every generation. The "failure" at releasing fully enabled chips in the GTX400 line made look as if it didn't happen, at least performance wise since Fermi CUDA cores are not as fast or efficient (clock for clock) as those on previous Nvidia cards. But in the end if you look at the GTX580 it's pretty damn close to being 100% faster than GTX280/285. And GTX 560 Ti is close to 50% faster. This is what Nvidia tried with GF100 and GF104, but only ultimately achieved with GF110 and GF114.
Look here, i couldn't find a direct comparison since W1zz stopped benching DX10 cards:
On the left GTX 460 is similar to GTX285. On the right GTX580 is almost twice as fast as the GTX 460.
I don't know why people (all over the internet) are so reluctant to believe a similar thing could happen this time around. Only this time they won't have to disable parts in the first place. It's not a crazy thought at all. At least IMO.
Has Nvidia done a "If we can't beat 'em, join 'em" thing?
AMD
- Gone with scalar shaders (which Nvidia has been doing for 6+ years)
- Gone modular with CU (which Nvidia has been doing since Fermi, 2 years now)
- GPGPU friendly architecture and caches (Fermi)
Nvidia
- Dropped hot-clocks
And Nvidia is doing what AMD? Come on, they dropped hot-clocks that's it, arguably because slower cores (yet smaller and in 2x amount) are more area/wattage efficient in 28nm, which did not necessarily apply to 40 nm, 65nm, 55nm...
The only interesting thing is that both GPU vendors have converged in a very similar architecture now that both pursue the same goals and are contrained by the same physical limits.
EDIT: ^^ And that's why I love tech BTW and specially GPUs. It's pure engineering. Solving an specific "problem" (rendering) in the best way they can, and looking 2 different vendors solving it so differently, but with so similar results has been very fun to watch, maybe in the coming years it will not be as fun as they converge more and more. Kind of like CPUs are mostly equal and there's a lot less to discuss (Bulldozer was a fresh attempt tho, yet it failed). I love tech anyway.
GK110 is the one we want and since it has just taped out, it will not be released until Q3. Sorry green fans :)
But since we are at making absurd claims with no posible way to back up: this chip WILL beat Tahiti, and by a good margin too.
Just kidding with that ;) In all seriousness, the specs are not impressive. It may come close to the 580 at a lower cost and better efficiency, but based on specs it is not a tahiti killer. Gotta wait for GK110 which just taped out. Thats the one i'll wait for, i'll be doing another round of upgrades around the September timeframe anyway.
3.79 TFLOPS Single Precision compute power
947 GFLOPS Double Precision compute power
Twice as much math processing power with only a 25% increase in "core" count and 25Mhz less core speed?
If these are official numbers from the green camp I feel sorry for their PR department making efficiency statements.
The GTX550 was the first (and so far only) card to break the evenly filled memory rule. It has 1GB through a 192-bit bus.
Be honest and say that because it is 256 bit, YOU think it's not going to be faster than GTX580 or something. Because based on specs, all of them, the card has 2x the crunching power than GTX580 (2.9 vs 1.5 Gflops). Twice as much texture power (128 vs 64) and 33% more memory, just to name a few.
I wouldn't even pay too much attention to the claim that GK110 just taped out BTW. "They" say that GK100 was canned, but there's absolutely no proof of that. "They" never knew when GK104 taped out either. Plus in 2010 by this time of the year there was also a chip called GF110 in the works, and based on when it was released (October 2010), its tape out had to happen around Feb/March too. It's posible that GK100 still exists and will be released soon after GK104, which is what many rumors say. Rumors from sources that turned out to be correct about GK104 specs several months ago, if we are to believe these specs. That's double precision* only and a huge improvement over GF104, both are capped because they are the mainstream parts. GF104 was capped at 1/12 the SP amount. GK104 is 1/6, which is a nice improvement for the performance part (for example in previous generations AMD didn't even support DP on anything but high-end). The high-end chip will feature 1/2 ratio and if Tahiti's number is really true (I thought Tahiti could do 1/2 DP :confused:), it will most definitely decimate it at DP performance.
*On SP 2.9 is definitely not half of 3.79 and like Crap Daddy said the GTX 580 had around 1.5 Gflops. Claimed theoretical GFlops means very little, except for comparing two chips using identical architecture. Obviously GK104 is not going to be 2x as fast as the GTX580 as the GFlops number suggest, or TMUs, but it will most definitely beat it by a good amount. How much? Look to previous gens and compare GTX560 Ti to GTX285. There's your most probable answer.
The fanboy is strong with this one. If GK104 cures cancer and is 20x faster than 7970 great! I'll buy one.
Unfortunately the reality is that the shader architecture of the GK104 is vastly different than that of Fermi, it takes 3 times the number of Kepler shader units to equal a Fermi Shader unit. because shader clocks will be equal to raster clocks on the Kepler. Hotclocking is gone, that is the fallacy of your argument that you stupidly don't realize because you can't see past your fanboy eyeglasses. Also, just so you know, Tflops is not a meaningful measure of performance.
But hey whatever helps you sleep at night!! Hopefully, Jen-Hsun Huang will give you a hug before you go to bed at night.
You can call me fanboy, because I'm stating the facts (as if I cared), but at least make up an argument that doesn't sound so stupid. At least I didn't make an account just to crap on a forum with my only 4 posts. Pff I don't know why I even cared to respond to you. I guess I didn't pay attention the first time. ^^ Freudian slip huh? :roll:
Ey you got me for 3 posts, is that considered a success in Trolland? Congrats anyway.