Wednesday, October 17th 2012
NVIDIA Kepler Refresh GPU Family Detailed
A 3DCenter.org report shed light on what NVIDIA's GPU lineup for 2013 could look like. According to the report, NVIDIA's next-generation GPUs could follow a similar path to previous-generation "Fermi Refresh" (GF11x), which turned the performance-per-Watt equation around back in favor of NVIDIA, even though the company's current GeForce Kepler has an established energy-efficiency lead. The "Kepler Refresh" family of GPUs (GK11x), according to the report, could see significant increases in cost-performance, with a bit of clever re-shuffling of the GPU lineup.
NVIDIA's GK104 GPU exceeded performance expectations, which allowed it to drive this generation's flagship single-GPU graphics card for NVIDIA, the GTX 680, giving the company time to perfect the most upscaled chip of this generation, and for its foundry partners to refine its 28 nm manufacturing process. When it's time for Kepler Refresh to go to office, TSMC will have refined its process enough for mass-production of GK110, a 7.1 billion transistor chip on which NVIDIA's low-volume Tesla K20 GPU compute accelerator is currently based.
The GK110 will take back the reins of powering NVIDIA's flagship single-GPU product, the GeForce GTX 780. This product could offer a massive 40-55% performance increase over GeForce GTX 680, with a price ranging anywhere between US $499 and $599. The same chip could even power the second fastest single-GPU SKU, the GTX 770. The GK110 physically packs 2880 CUDA cores, and a 384-bit wide GDDR5 memory interface.
Moving on, the real successor to the GK104, the GK114, could form the foundation for high-performance SKUs such as the GTX 760 Ti and 760. The chip has the same exact specifications as the GK104, leaving NVIDIA to tinker with clock speeds to increase performance. The GK114 will be relegated to performance-segment SKUs from the high-end segment it currently powers, and so even with minimal increases in clock speed, the chip will have achieved sizable performance gains over current GTX 660 Ti and GTX 660.
Lastly, the GK106 could see a refresh to GK116, too, retaining specifications and leaving room for clock speed increases, much in the same way as GK114, except, it gets a demotion to GTX 750 Ti, GTX 750, as well, and so with minimal R&D, the GTX 750 series gains a sizable performance gain over its previous generation.
Source:
3DCenter.org
NVIDIA's GK104 GPU exceeded performance expectations, which allowed it to drive this generation's flagship single-GPU graphics card for NVIDIA, the GTX 680, giving the company time to perfect the most upscaled chip of this generation, and for its foundry partners to refine its 28 nm manufacturing process. When it's time for Kepler Refresh to go to office, TSMC will have refined its process enough for mass-production of GK110, a 7.1 billion transistor chip on which NVIDIA's low-volume Tesla K20 GPU compute accelerator is currently based.
The GK110 will take back the reins of powering NVIDIA's flagship single-GPU product, the GeForce GTX 780. This product could offer a massive 40-55% performance increase over GeForce GTX 680, with a price ranging anywhere between US $499 and $599. The same chip could even power the second fastest single-GPU SKU, the GTX 770. The GK110 physically packs 2880 CUDA cores, and a 384-bit wide GDDR5 memory interface.
Moving on, the real successor to the GK104, the GK114, could form the foundation for high-performance SKUs such as the GTX 760 Ti and 760. The chip has the same exact specifications as the GK104, leaving NVIDIA to tinker with clock speeds to increase performance. The GK114 will be relegated to performance-segment SKUs from the high-end segment it currently powers, and so even with minimal increases in clock speed, the chip will have achieved sizable performance gains over current GTX 660 Ti and GTX 660.
Lastly, the GK106 could see a refresh to GK116, too, retaining specifications and leaving room for clock speed increases, much in the same way as GK114, except, it gets a demotion to GTX 750 Ti, GTX 750, as well, and so with minimal R&D, the GTX 750 series gains a sizable performance gain over its previous generation.
127 Comments on NVIDIA Kepler Refresh GPU Family Detailed
:laugh:
This isn't the first generation Maximus tech, either...
The thing is for the time being there's no Quadro GK110 as much as there's no GeForce GK110. And the reason is not that one is feasible and the other isn't. Such big chips were posible in GeForce in the past and surely are right now (more so since 28nm is so much better in regards to power consumption). And you'll see them, you can be sure of this, when Nvidis sees fit.
A couple of point- can't be fucked looking for the quotes on this drag race of a thread.
Medical imaging. My GF works in radiology (CAT, MRI etc) and the setup is Quadro for image output and 3D representation and Tesla for computation (math co-processor). There is no real difference between medical imaging and any HPC task ( weather forecast, economics/physics/ warfare simulation or any other complex number crunching).
Die size (Dave?) Posting pictures means the square root of fuck-all. Show a picture of an Nvidia chip that isn't covered by a heatspreader if you're making a comparison. BTW: A few mm here or there doesn't sound like a lot, but it impacts the number of usable die candidates substantially ( Die per wafer calculator)
GK110 is pretty much on schedule judging by it's estimated tape out. It looks to have had no more than two silicon revisions (and possibly only one) from initial risk wafer lot to commercial shipping. ORNL started receiving GK110 last month.
EDIT: Graph link
That's from before Kepler's launch. Long before. Nvidia has long planned dual-GPU infrastucture, because really, that's what makes sense. So making GK104 as GTX without all the cache, and GK110, with the cache for compute, and then doing the same for the next generation too, makes a whole lot of sense.
tpucdn.com/reviews/ASUS/HD_7970_Matrix/images/perfwatt_1920.gif Don't you see that your logic fails. That's why you are not being coherent. Why does GK110 have gaming/visualization features AT ALL if it was never meant for it sicne the beginning and they have Maximum as the final goal. It's as simple as that. You're now trying to legitimize the idea that Maximus is the way to go* and that it's Nvidia's plan since last generation. Again why those features on GK110 again?? Makes no sense, don't you see that. It's either one thing or the other, both cannot cohexist. Either GK110 was thought as a visualization/gaming (DirectX) powerhouse or not. If Maximus is Nvidia's idea for HPC and was their only intention with GK110. A GK104 + GK110 completely stripped off any other functionality than HPC would have ended up in an actually smaller die area, both conbined. But they didn't go that route and you really really have to think why. Why did they follow that route. Why there's rumors about GK110 based GeFerces and so on.
*I agree but it's beyond the point, and my comment regarding $$ is also true and you know that :) the context switching is simply also convenient and Kepler has context switching vastly improve so at some point 2 cards would not be required.
Like really....that pic to me says it all. :roll:
Is it really 512mm?
What clockspeeds are the Tesla cards? 600 Mhz, I'm guessing? Because their customers asked for it.
CUDA can use all those features you call "useless". It's not quite like how you put it...there's not really much if any dedicated hardware for the purposes you mention. At least not any that takes up any die space worth mentioning.
See that picture above? Point to me where these "DirectX features" are located...
The K20 spec released says 705M. Standard practice to keep the board power under the 225W limit (this is what happens when you try to keep clocks high to inflate FLOP performance in a compute enviroment) . I'd expect the GeForce card to be bound closer to (if not fudging over) the ATX 300W limit ( maybe 900 MHz or so)
With the shader count, the larger cache structure, provision for 72-bit (64 + 8 ECC) memory controllers I think the rumoured 550mm^2 die size is probably very close- another thing that argues against the GK110 being a gaming card (at least primarily). AFAIK, Nvidia's own whitepaper describes their ongoing strategy as gaming and compute becoming distinct product/architectural lines for the most part ( see Maxwell and the imminent threat of Intel's Xeon Phi)
And yes that pic says it all. If you really think that after 521 mm^2 GF110 they would put up a 294mm^2 chip against AMD's 365 mm^2 one, you are deluded sir. No it doesn't. Yes it does. Shader Processors have to be fatter, include more instructions or differently to how it would be best for HPC. The ISA has to be much wider, resulting in more complex fetch and decode, which not only widens the front end, but it makes it significantly slower. And there's tesselation of course. There's absolutely no sense in adding more functionality than it would be required. If functionality is there is because it's meant to be used. Are you serious? Too many beers today or what? You seem to be trolling now... :ohwell:
And yes, Bene..I has no idea...as i said like 5 times earlier....because I review motherboards and memory, not GPUs. GPUs are W1zz's territory.
wait.
K20 is GF110, not GK110.
:p
Still...damn that's a huge chip. Yes, serious. Show me EXACTLY where DirectX makes the die bigger. Because from what I've been lead to beelive by nVidia, it's actually the opposite of what you indicate...as does the rest fo the info i got from them. :p
Ever wondered why DX11 Shader Processors require more transistors/die area and are clock for clock slower than DX10 SPs? (i.e HD4870 vs HD5770)
Whatever, an easier example to show how stupid the question is. Can you point me out where the tesselators are? Tesselators actually are a separate entity, unlike functionality included in Shader Processors.
BTW K20 is GK110, lol.
EDIT: A more fittig analogy:
Please point me to where exactly they planted potatoes, where wheat and where corn.
That said you seem to only care about suckling on AMD's teat, and disparaging things you don't like, rather than having a discussion of substance.:shadedshu
Kepler compute cards have orders in the region of 100-150,000 units. At $3K + apiece (even taking into account low end GK104, since a 4xGPU "S" 1U system is bound to materialize taking the place of the S2090) it isn't hard too see how Nvidia would look at a modular mix-and-match approach to a gaming/workstation GPU and compute/workstation GPU. In a way, it matches AMD's past strategy, which is ironic considering that AMD have adopted compute at the expense of a larger die. The difference is that AMD wouldn't contemplate a monolithic GPU like GK110 - the risk is too great (process worries), and the return too small (not enough presence in the markets that it would be aimed at).
As to the whole monolithix thing, since the Fermi thing, it made sense to me that they would diverge, since they identified the problem there, and then realized that it could be an issue again in the future..one they could avoid on their higher-numbers-sold-but-less-profit products.
And considering the market, and nvidia's plans with ARM, it makes sense they'd want to sell you both a Tesla card, and a Quadro card, and a motherboard for it all with an arm chip. It's the same as buying CPU/GPU/board...
If you managed to get a GPU core clock of 1300 and a boost of 1400+, you got very lucky, akin to someone who got a 7970 and managed to hit 1300 MHz.
The other alternative is that AMD and Nvidia hack and slash the GPU, which doesn't seem all that likely. Adding compute, beefiing up the ROP/TMU count adds substantially to power draw and die size.
Right now, we see that the 7970 and 680 perform about the same when overclocked and so there's no real performance crown. Rather or not one of the companies manages to get that crown in the next round remains to be seen.
Sounds more like a nVidia problem.
In point of fact, you've just made my point.
EVGA 680 @ 1287 Core, 1377 boost, 6500 effective memory = 425W under OCCT
HD 7970GE @ 1150 Core, 1200 boost, 6400 effective memory= 484W under OCCT
If 425 watts is "absolutely through the roof", what's 484 watts ?
[source] True enough, but since overclocked performance isn't guaranteed, stock-vs-stock is probably a better indicator of current performance. Overclocking ability might be more an indicator of how a refresh might perform.
Lower power usage envelope now generally means more leeway on clocks on the refresh (all other things being equal) Who gives a fuck about an individual user in an industry context? When individual users buy more cards than OEM's then AMD and Nvidia will give Dell, HP and every other prebuilder the dismissive wanking gesture and thumb their nose at the ATX specification. Until then, both AMD and Nvidia are pretty much going to adhere to the 300W limit. OEM's don't buy out-of-spec parts.
/Majority of posters talk about the inductry situation, crazyeyes talks about crazyeyes situation Everyone here is speculating on details from an article that is itself speculating on the possible makeup of a IHV's card refresh. I put forward an hypothesis based on previous design history (HD 4870 -> HD 4890, GTX 480 -> GTX 580 for isolated examples) where refinement and design headroom produced performance increases. It is by no means the only argument-as shown by the thread, but I don't see it as being proved false by the graph you added- or the one I added to compliment it. And if we are commenting upon a speculative article with known fact only, I think the post count on the thread could be reduced by ~120 posts.