Friday, February 10th 2012
NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery
NVIDIA is bound to kickstart its competitive graphics processor lineup to AMD's Southern Islands Radeon HD 7000 series with GeForce Kepler 104 (GK104). We are learning through reliable sources that NVIDIA will implement a radically different design (by NVIDIA's standards anyway) for its CUDA core machinery, while retaining the basic hierarchy of components in its GPU similar to Fermi. The new design would ensure greater parallelism. The latest version of GK104's specifications looks like this:
SIMD Hierarchy
Source:
3DCenter.org
SIMD Hierarchy
- 4 Graphics Processing Clusters (GPC)
- 4 Streaming Multiprocessors (SM) per GPC = 16 SM
- 96 Stream Processors (SP) per SM = 1536 CUDA cores
- 8 Texture Units (TMU) per SM = 128 TMUs
- 32 Raster OPeration Units (ROPs)
- 256-bit wide GDDR5 memory interface
- 2048 MB (2 GB) memory amount standard
- 950 MHz core/CUDA core (no hot-clocks)
- 1250 MHz actual (5.00 GHz effective) memory, 160 GB/s memory bandwidth
- 2.9 TFLOP/s single-precision floating point compute power
- 486 GFLOP/s double-precision floating point compute power
- Estimated die-area 340mm²
139 Comments on NVIDIA GeForce Kepler Packs Radically Different Number Crunching Machinery
Very interested to see how well NV does at ATI's own game.
Considering this is 1536 shaders it would be logical to assume that the full fat model would have 2048 shaders, after all the GTX560TI was - in simplistic terms - roughly 75% of a GTX580.
The shader count itself is very interesting.
The increase in shaders (384-1536 if we assume a GTX560TI replacement) would suggest that each Kepler shader is less complex than its Fermi contemporary.
If we also assume similar performance to the HD7950 (doesn't seem to unrealistic) then clock for clock GCN and Kepler could be quite evenly matched (HD7950 has more shaders but a lower core clock).
Should be very interesting.
GK110 will probably be the Tahiti killer. At a price...
Ah crap they are too different, imposible to guesstimate the performance based on them (don't know how other people are so sure). I'll try to make my analysis anyway.
At a first glance it looks like they doubled GF104's shader domain (128 TMU, 4 GPCs, etc.) and then doubled the shader amount per SM because abandoning hot clocks allows for that. Performance wise the end result should be similar.
Based on die size this chip must contain twice the amount of transistors on GF104, while retaining the 256 bit bus, so there's no compelling reason to assume the shaders are any less capable than they were in Fermi. They could have just as easily gone with 768 SPs and hot-clocks within the same die size.
And finally efficiency. That's the key to knowing the performance. We don't know how well they will be able to use all those SP. I'd assume they are using 6x16 SP wide superscalar shader multiprocessors, but with how many schedulers? GF104 had 2. So now they have 4? Or since shaders run at half the speed the schedulers are just issuing the same amount of ops-per-cycle? (in reality cycles-per-op)
So many questions but I had fun. Based on raw specs this chip has the potential to rape any other card on the market, think 2x GTX560 Ti, at least at 1080/1200p. But efficiency/scaling is the key factor and that's completely unknown to us.
EDIT: As you can see, I changed my mind competely as I was writing this post. I first thought they were very different and came to realizing that they are pretty much the same. If you think about Fermi based GF104/114 as a 768 SP chip with no hot-clocks, they just doubled the amount of GPCs.
The specs look similar to what AMD has now, so given the estimated die size and unit counts, I'd say it would reach 580/7950 level performance. I doubt they'll price it at 300$ if 7950 is at 470$. More likely it's at best 50$ cheaper, that's enought to get the ball rolling. It's not really difficult to undercut the 7900 series in price, so regardless of performance it shouldn't be hard for Nvidia to claim a perf/$ crown simply because 7900 is sold at a premium currently. Of course AMD should respond to that, and I think this is the scenario we all hope for.
The specs look so identical that if I rename these specs as say....
HD7870:
256bit GDDR5 2GB memory
1536 CU, 128TMU, 32ROP, small 340mm^2 die size, no hot clocks.
It looks totally believable! Has Nvidia been hiring lots of ATI engineers? or they reversed engineered ATI's Cayman?
Jokes aside, some rational observations:
The specs itself looks like a mid-high end card, will be very competitive price wise as it uses 256bit memory and small die. I won't be surprise that it is only faster than cayman by 10-20%. It will be on par with GTX580 at best.
I believe Nvidia is working on a high end card which has yet to show itself.
"Reports coming in from the far east say that those high up in the priority list started getting Kepler cards in various guises early this week, possibly late last. The number of sightings from sources that SemiAccurate trusts has been going up almost exponentially over the past few days, and will probably keep doing so for a bit."
He concludes:
"If things go as normal, it takes 4-6 weeks from AIB sampling to cards on the shelves. This would mean late March or early April, just like we have been saying for weeks."