Monday, February 28th 2011
New CUDA 4.0 Release Makes Parallel Programming Easier
NVIDIA today announced the latest version of the NVIDIA CUDA Toolkit for developing parallel applications using NVIDIA GPUs. The NVIDIA CUDA 4.0 Toolkit was designed to make parallel programming easier, and enable more developers to port their applications to GPUs. This has resulted in three main features:
"Having access to GPU computing through the standard template interface greatly increases productivity for a wide range of tasks, from simple cashflow generation to complex computations with Libor market models, variable annuities or CVA adjustments," said Peter Decrem, director of Rates Products at Quantifi. "The Thrust C++ library has lowered the barrier of entry significantly by taking care of low-level functionality like memory access and allocation, allowing the financial engineer to focus on algorithm development in a GPU-enhanced environment."
The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:
For more information on the features and capabilities of the CUDA Toolkit and on GPGPU applications, please visit: http://www.nvidia.com/cuda
- NVIDIA GPUDirect 2.0 Technology -- Offers support for peer-to-peer communication among GPUs within a single server or workstation. This enables easier and faster multi-GPU programming and application performance.
- Unified Virtual Addressing (UVA) -- Provides a single merged-memory address space for the main system memory and the GPU memories, enabling quicker and easier parallel programming.
- Thrust C++ Template Performance Primitives Libraries -- Provides a collection of powerful open source C++ parallel algorithms and data structures that ease programming for C++ developers. With Thrust, routines such as parallel sorting are 5X to 100X faster than with Standard Template Library (STL) and Threading Building Blocks (TBB).
"Having access to GPU computing through the standard template interface greatly increases productivity for a wide range of tasks, from simple cashflow generation to complex computations with Libor market models, variable annuities or CVA adjustments," said Peter Decrem, director of Rates Products at Quantifi. "The Thrust C++ library has lowered the barrier of entry significantly by taking care of low-level functionality like memory access and allocation, allowing the financial engineer to focus on algorithm development in a GPU-enhanced environment."
The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:
- MPI Integration with CUDA Applications -- Modified MPI implementations automatically move data from and to the GPU memory over Infiniband when an application does an MPI send or receive call.
- Multi-thread Sharing of GPUs -- Multiple CPU host threads can share contexts on a single GPU, making it easier to share a single GPU by multi-threaded applications.
- Multi-GPU Sharing by Single CPU Thread -- A single CPU host thread can access all GPUs in a system. Developers can easily coordinate work across multiple GPUs for tasks such as "halo" exchange in applications.
- New NPP Image and Computer Vision Library -- A rich set of image transformation operations that enable rapid development of imaging and computer vision applications.
o New and Improved Capabilities
o Auto performance analysis in the Visual Profiler
o New features in cuda-gdb and added support for MacOS
o Added support for C++ features like new/delete and virtual functions
o New GPU binary disassembler
For more information on the features and capabilities of the CUDA Toolkit and on GPGPU applications, please visit: http://www.nvidia.com/cuda
77 Comments on New CUDA 4.0 Release Makes Parallel Programming Easier
In 2011, I see CUDA solely as nVidia's "evil" commitment to try to keep GPGPU to themselves and closed-source.
edit
oh the newest is 3.2
CUDA is easily portable to OpenCL, but there will be performance issues when compiling using AMD cards. NVIDIA has supported OpenCL just as long as AMD has. In fact, from personal experience, their implementation seems more solid compared to AMD's current SDK.
*And even then it's more than known that OpenGL has been 1 even 2 steps behind and still is in many way. It's also known how that has affected the market and most people would agree that advancement in DX has been a good thing. Well it is. That works the other way around too. That's the most stupid thing that people don't seem to understand. OpenCL may be cross-platform, but its optimizations certainly aren't. Code optimized for Nvidia GPUs would be slow on AMD GPUs and code optimized for AMD would be slow on Nvidia. Developers still have to code specifically for every platform, so what's so bad about Nvidia offering a much better and mature solution again? Nvidia should deliberately botch down their development so that the open for all platform can catch up? The enterprise world (i.e medical/geological imaging) should wait 2 years more in order to get what they could have now just because you don't want to feel in disadvantage in that little meaningless application or that stupid game? Come on...
"To hell the ability to best diagnose cancer or predict hearthquakes/tornados, I want this post process filter run as fast in my card as in that other one. That surely should be way up on their list, and to hell the rest. After all, I spend millions helping in the development of GPGPU and/or paying for the program afterwards... NO. Wait. That's the enterprises :banghead:, I'm actually the little whinny boy that demands that the FREE feature I get with my $200 GPU is "fair".
No matter what you say about CUDA being more developed than OpenCL, the truth is that nVidia works on CUDA in order to differentiate its GPUs, and not just to help the computing community. Neither are DirectX and OpenGL vendor-specific graphics optimizations. But at least in that case all the participants get a fighting chance through driver optimization.
What is so odd and stupid to you seems pretty simple to me. It only works in their GPUs. It's in all customers' interest to have competitive choices from various brands. No, they should redirect their efforts in CUDA because it is a vendor-specific API, and as such it has no long-term future. LOL yeah convince yourself that's the reason why nVidia is pushing CUDA, in an era where a dozen of GPU makers (nVidia, AMD, VIA, PowerVR, ARM Mali, Vivante, Broadcom, Qualcomm, etc) are supporting OpenCL in their latest GPUs.
Vendor specific is meaningless in the enterprise world and has always been. EVERYTHING is vendor specific in the enterprise world. They compile their code, x86 code for the specific CPU brand they chose for their server, using the best compiler available for there needs, they've been doing for decades, but now it's bad because it's Nvidia...
SOOOO once again what's wrong about Nvidia delivering the best API they can to those customers?
What you fail to understand is that Nvidia does not need to drop CUDA in order to support OpenCL. In fact every single feature, every single optimization they make for CUDA can help develop and evolve OpenCL. And when the most competitive, robust and easy to use combo right now is Nvidia GPU+CUDA is in customers best interest to get that and not wait 2+ years until OpenCL is in the same state for either AMD or Nvidia... really it's not that hard to understand...:shadedshu
"How about OpenCL then? Isn't it better to support an open source project?" Reply: I don't give a s**t as long as I can finish my work with the least amount of hassle, and CUDA supports that view.
Probably in the future OpenCL will be the leader, but for now CUDA does the job more efficiently than OpenCL.
If anything, AMD/ATI should've took the offer to utilize CUDA in their GPUs back when NVIDIA was giving the chance. With that kind of backing, it could've formed a true basis for OpenCL, especially since even Apple was even thinking about using it as it's foundation in the beginning before the Khronos Group adopted it.
When OpenCL catches up, then we can talk about how CUDA might be a hindrance to the market.
You also fail to understand that this has been nVidia's strategy for quite some time.
As Jen-Hsu Huang said, "were a software company". Sure, it's been there for longer.
And so was Glide, when it came down. lol, wrong. Costs go way down if you adopt open source software. x86 code that can be run by all x86 cpu makers. Hence why sometimes we see design wins for AMD, others we see the same for Intel.
Well, there was this instruction-set specific tryout from Intel to the server market. Look how well that went, lol. Because there are non-vendor-exclusivie alternatives, open source or not. And what you fail to understand is that nVidia could do that same optimization in OpenCL to start with. 2 years?!?? LOL. I just made a list of eight GPU vendors pushing OpenCL 1.1 compatibility in their latest GPUs right now.
Yes, CUDA has been around longer, receives more support, and is a better product in almost all ways then OpenCL. That alone should be enough reason why people choose CUDA: not everybody is bothered about "open source" and things like that, they just want to complete their work.
Initial costs for open source is low, but once you factor in support it goes right back up. Also, I don't really see the difference between CUDA and OpenCL: Both are "free", not in the traditional sense, but in the relative sense.
Intel tried to break away from the x86, its own standard. It failed hard. Not applicable here.
Yes, Nvidia can do the same optimisation at start, but on the other hand, OpenCL was still in its infancy when Nvidia started pushing CUDA. I think its because it doesn't want to be bothered with "external standards" and prefer to have its own list of requirements.
are not so Vocative when it come to OpenGl and DirectX maybe because
one GPU vendor run OpenGL better than the other one
it's right that cuda is not opensource but as I understand it's royalty
free and the only reason that the programs written for it are not able
to use it is because AMD did not wanted to come of it's high hours
and develop a CUDA driver for it's card or probably their software
engineers could not do it who knows ?
by the way Cuda is portable to everywhere if you like you can tomorrow
make a toaster that utilize CUDA for it's work and the Good point is that
it's royalty free not as something like DirectX that relay on Bloat-Ware to
run
also, where is this info about directX being bloatware? the only bloat about it is that it requires windows...
use Linux or Mac are not that welcome the world of DirectX
by the way did ATI really need an offer ?
isn't it free to develop your CUDA Hardware and software implementation ?
CUDA does not.
At most, you could compare it to MacOS X, since it only supports whatever hardware that Apple choses to include in their computers at a given time.
Regardless of how well seen it is from a developer's point of view, it's just one more method for nVidia to try to sell more hardware with an exclusive computing API.
you have to pay, and get nvidias approval to use cuda for a commercial product. Hell, look how much of a tightarse they've been with hardware accelerated physX, which runs on CUDA.
The only thing people are complaining about is the fact that CUDA is "locked" to NVIDIA cards only, which I wholeheartedly agree with. Personally, it's the only reason why I have a GTX 460 768MB along side my Crossfire setup.
What everyone is failing to understand is that optimization is already existing for NVIDIA's implementation of OpenCL (they have 100% compatibility with OpenCL 1.1 as much as AMD has), it's just that CUDA is more in use because of the wide array of functions and support. (e.g. optimizations, direct video memory usage, static code analysis, etc.) Again, usage of CUDA is free, just like using *nix. A lot of open source (and commercial) developers would not be using it if it wasn't.