Thursday, January 26th 2012
NVIDIA Releases CUDA Toolkit 4.1
NVIDIA today released a new version of its CUDA parallel computing platform, which will make it easier for computational biologists, chemists, physicists, geophysicists, other researchers, and engineers to advance their simulations and computational work by using GPUs.
The new NVIDIA CUDA parallel computing platform features three key enhancements that make parallel programing with GPUs easier, more accessible and faster. These include:
- Re-designed Visual Profiler with automated performance analysis, providing an easier path to application acceleration
- New compiler, based on the widely-used LLVM open-source compiler infrastructure, delivering up to 10 percent speed up in application performance
- Hundreds of new imaging and signal processing functions, doubling the size of the NVIDIA Performance Primitives (NPP) library
"The new visual profiler is amazing," said Joshua Anderson, lead developer of the HOOMD-blue open source molecular dynamics project. "With just a few clicks, it performs an automated performance analysis of your application, highlights likely problem areas, and then provides links to best-practice suggestions on improving them. It makes it quick and easy for virtually all developers to accelerate a broad range of applications."
"The LLVM complier gave me an almost immediate 10 percent performance speed up, just by recompiling my existing real-time financial risk analysis code," said Gilles Civario, senior software architect at the Irish Centre for High-End Computing. "I can only imagine the additional performance gains I can achieve with additional tuning using the new CUDA release."
Among the new features of the latest CUDA parallel computing platform release - available free of charge on the NVIDIA developer web site at developer.nvidia.com/getcuda - are:
New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.
LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.
New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
The new NVIDIA CUDA parallel computing platform features three key enhancements that make parallel programing with GPUs easier, more accessible and faster. These include:
- Re-designed Visual Profiler with automated performance analysis, providing an easier path to application acceleration
- New compiler, based on the widely-used LLVM open-source compiler infrastructure, delivering up to 10 percent speed up in application performance
- Hundreds of new imaging and signal processing functions, doubling the size of the NVIDIA Performance Primitives (NPP) library
"The new visual profiler is amazing," said Joshua Anderson, lead developer of the HOOMD-blue open source molecular dynamics project. "With just a few clicks, it performs an automated performance analysis of your application, highlights likely problem areas, and then provides links to best-practice suggestions on improving them. It makes it quick and easy for virtually all developers to accelerate a broad range of applications."
"The LLVM complier gave me an almost immediate 10 percent performance speed up, just by recompiling my existing real-time financial risk analysis code," said Gilles Civario, senior software architect at the Irish Centre for High-End Computing. "I can only imagine the additional performance gains I can achieve with additional tuning using the new CUDA release."
Among the new features of the latest CUDA parallel computing platform release - available free of charge on the NVIDIA developer web site at developer.nvidia.com/getcuda - are:
New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.
LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.
New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
10 Comments on NVIDIA Releases CUDA Toolkit 4.1
:rockout: CUDA on AMD?! :respect:
I hope AMD takes advantage of this, but for some reason I don't find it likely.. but who knows?
Hows that OpenCL working for you Nvidia?
Maybe you didn't realise that porting CUDA to OpenCL was a relatively easy task. So CUDA ports to OCL, and Nvidia is a member of Khronos...seems like they have the bases covered no?
On one hand we have a completely open standard, on the other CUDA a extension of X87 run on GPU cores, and now they have finally released a updated product after how long?
First of all, this open standard belongs to Apple and they license it to Khronos. From the Khronos webpage: If Khronos loses it's license or Apple sells OpenCL to someone or Khronos loses funding and so many other things that could happen, we could see OpenCL just die. The "Apple" part is of much concern to me.
It took a full year for Khronos to finally update OpenCL to version 1.2, and still the implementation lacks serious functionality for larger developers (like Adobe and the like) to have any real use for it. And with such a crawlingly slow development cycle, there is little interest from developers, because they can't wait for years to get the functionality they need.
Second, AMD also has his Close to Metal/Stream/APP (who knows what other names they'll give the technology) and OpenCL is built on that tech just as OpenCL is built on CUDA. In this respect AMD and NVIDIA support OpenCL in the same way with the exact same model.
Also there are other standards and they are in a way all competing with eachother, for example BrookGPU, NPP and many others. You can't expect companies to support just one standard when there are so many more. Especially when the development cycle is so slow.
You can look at Linux and how much fragmentation is in that market. At this point "Linux" is just an umbrella term to cover hundreds of operating systems. Versions of Linux that were updated frequently and they included the features the users actually need survived and grew their userbase.
At this time OpenCL is like an infant Linux distro that has a poor update cycle and does not include the features their userbase would require to start building applications on top of it.
And so developers will just use the next best thing, and most of the time, that is CUDA (and it's additional supporting libraries that are growing in number, and in "openness"), more then Stream/APP.
AMD (not really Intel) need to step up their game since Stream is not going anywhere at all.