Monday, November 16th 2009

New NVIDIA Tesla GPUs Reduce Cost Of Supercomputing By A Factor Of 10

NVIDIA Corporation today unveiled the Tesla 20-series of parallel processors for the high performance computing (HPC) market, based on its new generation CUDA processor architecture, codenamed "Fermi".

Designed from the ground-up for parallel computing, the NVIDIA Tesla 20-series GPUs slash the cost of computing by delivering the same performance of a traditional CPU-based cluster at one-tenth the cost and one-twentieth the power.
The Tesla 20-series introduces features that enable many new applications to perform dramatically faster using GPU Computing. These include ray tracing, 3D cloud computing, video encoding, database search, data analytics, computer-aided engineering and virus scanning.

"NVIDIA has deployed a highly attractive architecture in Fermi, with a feature set that opens the technology up to the entire computing industry," said Jack Dongarra, director of the Innovative Computing Laboratory at the University of Tennessee and co-author of LINPACK and LAPACK.

The Tesla 20-series GPUs combine parallel computing features that have never been offered on a single device before. These include:
  • Support for the next generation IEEE 754-2008 double precision floating point standard
  • ECC (error correcting codes) for uncompromised reliability and accuracy
  • Multi-level cache hierarchy with L1 and L2 caches
  • Support for the C++ programming language
  • Up to 1 terabyte of memory, concurrent kernel execution, fast context switching, 10x faster atomic instructions, 64-bit virtual address space, system calls and recursive functions
At their core, Tesla GPUs are based on the massively parallel CUDA computing architecture that offers developers a parallel computing model that is easier to understand and program than any of the alternatives developed over the last 50 years.

"There can be no doubt that the future of computing is parallel processing, and it is vital that computer science students get a solid grounding in how to program new parallel architectures," said Dr. Wen-mei Hwu, Professor in Electrical and Computer Engineering of the University of Illinois at Urbana-Champaign. "GPUs and the CUDA programming model enable students to quickly understand parallel programming concepts and immediately get transformative speed increases."

The family of Tesla 20-series GPUs includes:
  • Tesla C2050 & C2070 GPU Computing Processors
  • Single GPU PCI-Express Gen-2 cards for workstation configurations
  • Up to 3GB and 6GB (respectively) on-board GDDR5 memory
  • Double precision performance in the range of 520GFlops - 630 GFlops
  • Tesla S2050 & S2070 GPU Computing Systems
  • Four Tesla GPUs in a 1U system product for cluster and datacenter deployments
  • Up to 12 GB and 24 GB (respectively) total system memory on board GDDR5 memory
  • Double precision performance in the range of 2.1 TFlops - 2.5 TFlops
The Tesla C2050 and C2070 products will retail for $2,499 and $3,999 and the Tesla S2050 and S2070 will retail for $12,995 and $18,995. Products will be available in Q2 2010. For more information about the new Tesla 20-series products, visit the Tesla product pages.

As previously announced, the first Fermi-based consumer (GeForce) products are expected to be available first quarter 2010.
Add your own comment

53 Comments on New NVIDIA Tesla GPUs Reduce Cost Of Supercomputing By A Factor Of 10

#51
jessicafae
It looks like the single precision performance of C2070 (NV100 Fermi) is only 35% better than the previous generation C1060 Tesla (GT200 based). Granted for HPC the double precision is most important for this product. This will be a very interesting HPC/Supercomputer part. But gaming uses single-precision mostly, so the Geforce fermi will be interesting....

www.brightsideofnews.com/news/2009/11/17/nvidia-nv100-fermi-is-less-powerful-than-geforce-gtx-285
This table is coming from here.
This is a small comparison between three generations of Tesla parts:
3Q 2007: C870 1.5GB - $799 - 518 GFLOPS SP / No DP support
2Q 2008: C1060 4GB - $1499 - 933 GFLOPS / 78 GFLOPS DP
2Q 2010: C2050 3GB - $2499 - 1040 GFLOPS / 520 GFLOPS DP
3Q 2010: C2070 6GB - $3999 - 1260 GFLOPS / 630 GFLOPS DP
Posted on Reply
#52
Benetanegia
jessicafaeIt looks like the single precision performance of C2070 (NV100 Fermi) is only 35% better than the previous generation C1060 Tesla (GT200 based). Granted for HPC the double precision is most important for this product. This will be a very interesting HPC/Supercomputer part. But gaming uses single-precision mostly, so the Geforce fermi will be interesting....

www.brightsideofnews.com/news/2009/11/17/nvidia-nv100-fermi-is-less-powerful-than-geforce-gtx-285
Remember that the GT200 numbers are with dual-issue (MADD+MUL= 3 ops/s) and that Fermi is showing FMA numbers (2 ops/s). In reality GT200 could never or almost never have access to the extra MUL so it was actually only MADD (2 ops/s) most of the times and specially in games only MADD was in use. The actual number was in reality ~650 Gflops for the GT200 cards. Performance has mostly been doubled and if you consider what they say in update #2, the numbers shown for the Tesla's have the performance hit from ECC added to the equation. According to a document released by Nvidia some time ago ECC can hurt performance for as much as 20%, 5-20% they said, depending on the application. That's why GeForces will have ECC support disabled. They also say that Tesla cards are lowest clocked Fermi product in order to meet the high stability required for HPC qualification, they have to work for years of 24/7 operation. All things taken into account Fermi more than delivers a 2x increase in performance, at least on paper. We will find out in Q1 2010.
Posted on Reply
#53
jessicafae
BenetanegiaRemember that the GT200 numbers are with dual-issue (MADD+MUL= 3 ops/s) and that Fermi is showing FMA numbers (2 ops/s). In reality GT200 could never or almost never have access to the extra MUL so it was actually only MADD (2 ops/s) most of the times and specially in games only MADD was in use. The actual number was in reality ~650 Gflops for the GT200 cards. Performance has mostly been doubled and if you consider what they say in update #2, the numbers shown for the Tesla's have the performance hit from ECC added to the equation. According to a document released by Nvidia some time ago ECC can hurt performance for as much as 20%, 5-20% they said, depending on the application.
This is interesting. I am guessing that Nvidia had to adjust the official GFLop numbers for Tesla (not dual-issue SP) to bring them closer to reality because of the big HPC contracts they are negotiating.

The latest CPUs are really not that far behind Tesla these days for HPC :: Fujitsu's Venus SPARC64 VIIIfx can do 128Gflops double precision in around 40watts (compared to the new Tesla C2050/C2070 official520-630 GFlops DP in 190watts). And IBM Power7 will be around 256Gflops per CPU when deployed in 2010/2011 for NCSA's "Blue Waters" supercomputer.

I did find the last statement of update#2 interesting.
from here. Tesla cGPUs differ from GeForce with activated transistors that significantly increase the sustained performance, rather than burst mode.
Posted on Reply
Add your own comment
Nov 29th, 2024 04:18 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts