Wednesday, February 17th 2016
NVIDIA GP100 Silicon to Feature 4 TFLOPs DPFP Performance
NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100, is shaping up to be a number-crunching monster. According to a leaked slide by an NVIDIA research fellow, the company is designing the chip to serve up double-precision floating-point (DPFP) performance as high as 4 TFLOP/s, a 3-fold increase from the 1.31 TFLOP/s offered by the Tesla K20, based on the "Kepler" GK110 silicon.
The same slide also reveals single-precision floating-point (SPFP) performance to be as high as 12 TFLOP/s, four times that of the GK110, and nearly double that of the GM200. The slide also appears to settle the speculation on whether GP100 will use stacked HBM2 memory, or GDDR5X. Given the 1 TB/s memory bandwidth mentioned on the slide, we're inclined to hand it to stacked HBM2.
Source:
3DCenter.org
The same slide also reveals single-precision floating-point (SPFP) performance to be as high as 12 TFLOP/s, four times that of the GK110, and nearly double that of the GM200. The slide also appears to settle the speculation on whether GP100 will use stacked HBM2 memory, or GDDR5X. Given the 1 TB/s memory bandwidth mentioned on the slide, we're inclined to hand it to stacked HBM2.
32 Comments on NVIDIA GP100 Silicon to Feature 4 TFLOPs DPFP Performance
FirePro W9100: www.amd.com/en-us/products/graphics/workstation/firepro-3d/9100#
- 320 GB/s memory bandwidth
- 5.24 TFLOPS peak single-precision floating-point performance
- 2.62 TFLOPS peak dual-precision floating-point performance
SP BYTE/FLOP = 0.061068DP BYTE/FLOP = 0.122137
Wow.... comparing the Tesla cards to a 7970.
The exception of course is Crysis, but that's about it.
Edit: To reiterate, i quote the OP: NVIDIA's upcoming flagship GPU based on its next-generation "Pascal" architecture, codenamed GP100. Specifically mentioning "flagship", then comparing it to K20x and 7970 is at the very least misleading.
i think those looking for huge gains or cost savings are gonna be a little disappointed..
trog
I'd be wary about taking too much Pascal info for granted in the slide if the information is that old.
I think I read that lower-end Pascals would feature GDDR5X, but that's about all of it.
If AMD release a solid card, it will be humbling for Nvidia (which we all agree would be very good). It really depends what each company's moles know about each others tech. Perhaps it will be Tahiti versus GK104 all over again? Perhaps it will be 290X versus 780ti? I would like to see AMD come out with a better card and one that puts pressure on Nvidia.
But, if AMD have no Polaris performance part ready, Yeah, Nvidia will do exactly what they always do, milk the mid range as the best part until they need to release their top end. I doubt Nvidia will jump when AMD release the dual Fiji part. It will give AMD hands down the fastest card but it wont be seen as a 'valid' threat to Nvidia's 980ti (dual versus single arguments).
2. The graph speaks of a correlation between memory usage and the 1st derivative aka bytes per flop. What NVidia is basically saying is that the point in which information is being stored to the framebuffer for 32 or 64 bit floating point precision executions, the usage is actually less if you compare it to other products with a similar relationship. Furthermore, I think it's a typo when the graph shows 0,256 and 0,805 for SP and DP on the new Pascal. It's probably meant to say 0.256 bytes per flop SP and 0.805 bytes per flop DP.
3. 7970 and above, 64bit FPP has actually gone up for AMD Graphic Cards probably because AMD saw a small niche in the market where AMD Consumers would use their discrete graphic cards to render videos and others in a time where NVidia was taking it away after the first Titan series generation. NVidia was thinking that they could remove the 64bitFPP in gaming cards, and this would probably boost the sells of Quandro Cards, but there wasn't really a big difference in sales (speculation), and you can see this in M4000 where you have a Maxwell Titan and Workstation card providing about the same performance/features to rendering. The only difference is the driver that was probably significant for the most part.
4. Tesla is more of a number cruncher, and it's contender is the Intel's knight's landing or any server CPUs. Simply put, it's an accelerator card, but it still acts as a Graphic Card: Offload GPU executions to the GPU for processing and image rendering, use CUDA, blah blah blah. Some would say that Knights Landing is a work in progress and Intel's failure at an Intel Graphics Card. Intel's Xeon PHI is future a proof toys that can't be used for practical applications because a lot of current softwares don't utilize multi-core coding, and in order to make it work, you need to be someone who knows how to code both for a program and on the Xeon Phi to make it work remotely (in theory). From my understanding, you can't just load a PC game, and 64 micro CPU cores from Knights Landing is going to make your bottleneck troubles disappear. Thus giving you an FPS of 3,000 on World of Warcraft on ultra high settings. NO! The PC game utilizes coding to function with the physical Core for your CPU, but other codes need to be implemented for Knights Landing--that's assuming it works properly when you do that, to make it work. While Intel has it's multicore coding for Knight's Landing, NVidia's Tesla line uses Cuda. They say it's more efficient, and it provides better performance than Knights Landing. Overall, I think it's just a glorified GPU with some Nitro or rocket boosters... Tesla can't act as a substitute CPU through your PCI bus for increased performance, but it can improve rendering times for programs that utilize GPU rendering, and the coding is less complicated??
5. Majority of CPUs have poor 64bit FPP in general. Take a look at the Sandy Bridge Xeon 2690 in the table. 64bitFPP is only what, 243.2Teraflops versus the AMD 7970 at 1010. TeraFLops in DP alone.
6. 64bitFPP isn't a major function for every, normal use and PC Gaming. So in a sense, Intel and AMD can say "big F***en Deal," but to renderers and CGI people who use NVidia's codes to render particle effects, we'll be like OMG, that's going to make my epeen super sexy. Frames times are cut down from 10 minutes to 10 seconds. Woot WOOT! I can hit the clubs a lot sooner.
Common, lets be real for once.
I'm gaming with full details ALL existing games on 1080p with my (now) crappy 780 Ti card and so far there is zero reason to upgrade. If the rummors are true, then those new cards will be at least 700$ or more in East Asia/Europe...
Good luck with that.