Monday, November 17th 2014

NVIDIA Unveils Tesla K80 Dual-Chip Compute Accelerator

NVIDIA today unveiled a new addition to the NVIDIA Tesla Accelerated Computing Platform: the Tesla K80 dual-GPU accelerator, the world's highest performance accelerator designed for a wide range of machine learning, data analytics, scientific, and high performance computing (HPC) applications.

The Tesla K80 dual-GPU is the new flagship offering of the Tesla Accelerated Computing Platform, the leading platform for accelerating data analytics and scientific computing. It combines the world's fastest GPU accelerators, the widely used CUDA parallel computing model, and a comprehensive ecosystem of software developers, software vendors, and datacenter system OEMs.
The Tesla K80 dual-GPU accelerator delivers nearly two times higher performance and double the memory bandwidth of its predecessor, the Tesla K40 GPU accelerator. With ten times higher performance than today's fastest CPU, it outperforms CPUs and competing accelerators on hundreds of complex analytics and large, computationally intensive scientific computing applications.

Users can unlock the untapped performance of a broad range of applications with the accelerator's enhanced version of NVIDIA GPU Boost technology (PDF), which dynamically converts power headroom into the optimal performance boost for each individual application.

Industry-Leading Performance for Science, Data Analytics, Machine Learning
The Tesla K80 dual-GPU accelerator was designed with the most difficult computational challenges in mind, ranging from astrophysics, genomics and quantum chemistry to data analytics. It is also optimized for advanced deep learning tasks, one of the fastest growing segments of the machine learning field.

"NVIDIA GPUs have become the de facto computing platform for the deep learning community," said Yann LeCun, director of AI Research at Facebook, and Silver Professor of Computer Science & Neural Science at New York University. "Because the accuracy of deep learning systems improves as the models and datasets get larger, we always look for the fastest hardware we can find. The Tesla K80 accelerator, with its dual-GPU architecture and large memory, gives us more teraflops and more GB than ever before from a single server, allowing us to make faster progress in deep learning."

The Tesla K80 delivers up to 8.74 teraflops single-precision and up to 2.91 teraflops double-precision peak floating point performance, and10 times higher performance than today's fastest CPUs on leading science and engineering applications, such as AMBER, GROMACS, Quantum Espresso and LSMS.

"The Tesla K80 dual-GPU accelerators are up to 10 times faster than CPUs when enabling scientific breakthroughs in some of our key applications, and provide a low energy footprint," said Wolfgang Nagel, director of the Center for Information Services and HPC at Technische Universität Dresden in Germany. "Our researchers use the available GPU resources on the Taurus supercomputer extensively to enable a more refined cancer therapy, understand cells by watching them live, and study asteroids as part of ESA's Rosetta mission."

Key features of the Tesla K80 dual-GPU accelerator include:
  • Two GPUs per board - Doubles throughput of applications designed to take advantage of multiple GPUs.
  • 24GB of ultra-fast GDDR5 memory - 12GB of memory per GPU, 2x more memory than Tesla K40 GPU, allows users to process 2x larger datasets.
  • 480GB/s memory bandwidth - Increased data throughput allows data scientists to crunch though petabytes of information in half the time compared to the Tesla K10 accelerator. Optimized for energy exploration, video and image processing, and data analytics applications.
  • 4,992 CUDA parallel processing cores - Accelerates applications by up to 10x compared to using a CPU alone.
  • Dynamic NVIDIA GPU Boost Technology - Dynamically scales GPU clocks based on the characteristics of individual applications for maximum performance.
  • Dynamic Parallelism - Enables GPU threads to dynamically spawn new threads, enabling users to quickly and easily crunch through adaptive and dynamic data structures.
The Tesla K80 accelerates the broadest range of scientific, engineering, commercial and enterprise HPC and data center applications -- more than 280 in all. The complete catalog of GPU-accelerated applications (PDF) is available as a free download.

More information about the Tesla K80 dual-GPU accelerator is available at NVIDIA booth 1727 at SC14, Nov. 17-20, and on the NVIDIA high performance computing website.

Users can also try the Tesla K80 dual-GPU accelerator for free on remotely hosted clusters. Visit the GPU Test Drive website for more information.

Availability
Shipping today, the NVIDIA Tesla K80 dual-GPU accelerator will be available from a variety of server manufacturers, including ASUS, Bull, Cirrascale, Cray, Dell, Gigabyte, HP, Inspur, Penguin, Quanta, Sugon, Supermicro and Tyan, as well as from NVIDIA reseller partners.
Add your own comment

10 Comments on NVIDIA Unveils Tesla K80 Dual-Chip Compute Accelerator

#1
GhostRyder
Wow that is one BEAUTIFUL Tesla card. Man 12gb per GPU (Though 24gb for professional work!!!).

I love the design of these Tesla cards, just fantastic looking crafted and refined work horses!
Posted on Reply
#2
64K
That really is a nice looking card and the performance is crazy fast.




Source: videocardz.com
Posted on Reply
#3
jboydgolfer
I use one of these, as a booster for My Microsoft Security Essentials Scans....... ;).
Posted on Reply
#4
blibba
64K

Source: videocardz.com
This graph is really misleading - of course absolute difference grows over time. Absolute difference would grow if you went from [1 CPU vs. 1 GPU] to [2 CPUs vs. 2 GPUs]. Has the relative difference grown, showing that GPUs are genuinely extending their lead? It's hard to tell based on these graphs.
Posted on Reply
#6
HumanSmoke
Interesting evolution. Tesla used to a by-product of consumer graphics, then GK 110 launched first as a professional series, now GK 210 is entirely a pro SKU.
Posted on Reply
#7
The Von Matrices
Would the lack of a GeForce product be the reason for having only 2496 shaders per chip?

I thought that Nvidia designated that chips with defects get sold as GeForce products. Since GK210 doesn't seem like it will ever make it to a GeForce card and since yields will never be great on a ~600mm^2 chip then they would have to use harvested silicon or face very low yields/high costs.
Posted on Reply
#8
HumanSmoke
The Von MatricesWould the lack of a GeForce product be the reason for having only 2496 shaders per chip?
I thought that Nvidia designated that chips with defects get sold as GeForce products. Since GK210 doesn't seem like it will ever make it to a GeForce card and since yields will never be great on a ~600mm^2 chip then they would have to use harvested silicon or face very low yields/high costs.
2496 cores/ 13SMX is actually the same core/SM count as the original Tesla K20. The reduced count could be to fit within the 300w power budget as is the reduced memory speed (5GHz instead of 5.2). 24GB of GDDR5 would suck up a sizeable portion of the board power. The other possibility could be yields, but I'm picking that two of the missing numbers in the sequence (K20, K40, K50, K60, K70, K80) might equate to the two higher bin parts (2688 core/14SMX, 2880 core/15SMX). Nvidia might want to decrease inventory levels of the K40 before it looks at a replacement SKU. The dual card launching first makes some sense because it isn't replacing anything, its performance makes it a convincing competitor to Xeon Phi, and also conveniently showcases NVLink.
Posted on Reply
#9
Aquinus
Resident Wat-man
Too bad that like most OpenCL devices, the Tesla card needs to be used for very specific situations. Not all workloads can benefit from a super wide execution engine. So all in all, unless you're working on big data and most of your code aren't serial workloads, this doesn't mean a whole much.
Posted on Reply
#10
HumanSmoke
AquinusToo bad that like most OpenCL devices, the Tesla card needs to be used for very specific situations. Not all workloads can benefit from a super wide execution engine. So all in all, unless you're working on big data and most of your code aren't serial workloads, this doesn't mean a whole much.
Isn't that the point?
This seems aimed squarely at Nvidia's contribution to OpenPOWER (NVLink included). Look at the launch partners (Cray, Dell, HP, Quanta), the timing (both at SC14) WRT to Knights Landing/Knights Hill and POWER9, and who owns the non-3D big data space (IBM). SC14 is basically an exercise in muscle flexing for three consortiums.
Posted on Reply
Nov 14th, 2024 21:15 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts