Thursday, October 1st 2009

NVIDIA Unveils Next Generation CUDA GPU Architecture – Codenamed ''Fermi''

NVIDIA Corp. today introduced its next generation CUDA GPU architecture, codenamed "Fermi". An entirely new ground-up design, the "Fermi" architecture is the foundation for the world's first computational graphics processing units (GPUs), delivering breakthroughs in both graphics and GPU computing.

"NVIDIA and the Fermi team have taken a giant step towards making GPUs attractive for a broader class of programs," said Dave Patterson, director Parallel Computing Research Laboratory, U.C. Berkeley and co-author of Computer Architecture: A Quantitative Approach. "I believe history will record Fermi as a significant milestone."
Presented at the company's inaugural GPU Technology Conference, in San Jose, California, "Fermi" delivers a feature set that accelerates performance on a wider array of computational applications than ever before. Joining NVIDIA's press conference was Oak Ridge National Laboratory who announced plans for a new supercomputer that will use NVIDIA GPUs based on the "Fermi" architecture. "Fermi" also garnered the support of leading organizations including Bloomberg, Cray, Dell, HP, IBM and Microsoft.

"It is completely clear that GPUs are now general purpose parallel computing processors with amazing graphics, and not just graphics chips anymore," said Jen-Hsun Huang, co-founder and CEO of NVIDIA. "The Fermi architecture, the integrated tools, libraries and engines are the direct results of the insights we have gained from working with thousands of CUDA developers around the world. We will look back in the coming years and see that Fermi started the new GPU industry."

As the foundation for NVIDIA's family of next generation GPUs namely GeForce, Quadro and Tesla − "Fermi" features a host of new technologies that are "must-have" features for the computing space, including:
  • C++, complementing existing support for C, Fortran, Java, Python, OpenCL and DirectCompute.
  • ECC, a critical requirement for datacenters and supercomputing centers deploying GPUs on a large scale
  • 512 CUDA Cores featuring the new IEEE 754-2008 floating-point standard, surpassing even the most advanced CPUs
  • 8x the peak double precision arithmetic performance over NVIDIA's last generation GPU. Double precision is critical for high-performance computing (HPC) applications such as linear algebra, numerical simulation, and quantum chemistry
  • NVIDIA Parallel DataCache - the world's first true cache hierarchy in a GPU that speeds up algorithms such as physics solvers, raytracing, and sparse matrix multiplication where data addresses are not known beforehand
  • NVIDIA GigaThread Engine with support for concurrent kernel execution, where different kernels of the same application context can execute on the GPU at the same time (eg: PhysX fluid and rigid body solvers)
  • Nexus - the world's first fully integrated heterogeneous computing application development environment within Microsoft Visual Studio
Add your own comment

49 Comments on NVIDIA Unveils Next Generation CUDA GPU Architecture – Codenamed ''Fermi''

#26
soldier242
now its official, this is the 5870 killer, damn i had hoped that ATi would get some more marketshares ... but with an opponend like this, oh boy

now we'd need some benchies ... i believe Wizz does know more about the new card then we
Posted on Reply
#27
gumpty
Soldier,

Market share does not [directly] come from having the performance crown. It comes from having the best price-performance ratio. In the GT200 vs HD4000 era Nvidia had the fastest GPU but ATI gained market-share because of it's better value.

Of course the halo-effect can have a small influence, but it is generally not great.
Posted on Reply
#29
Zubasa
FordGT90ConceptSince it included DirectCompute (which is an API inside of the DirectX 11 package).

DX11 offers unified sound support, unified input support, unified networking support, and more that CUDA does not. But that's not what you were asking. CUDA and DirectCompute are virtually the same with one caveat: Microsoft will flex their industry muscles to get developers to use it and developers will want to use it because the same code will work on NVIDIA, AMD, and Intel GPUs.

DirectCompute has everything to do with parallel computing. That is the reason why it was authored.
DX 11 includes Direct Compute 5.0.
DC4.0 and 4.1 are avalible to DX10 and 10.1 hardware respectively.
Posted on Reply
#30
newtekie1
Semi-Retired Folder
FordGT90ConceptSince it included DirectCompute (which is an API inside of the DirectX 11 package).

DX11 offers unified sound support, unified input support, unified networking support, and more that CUDA does not. But that's not what you were asking. CUDA and DirectCompute are virtually the same with one caveat: Microsoft will flex their industry muscles to get developers to use it and developers will want to use it because the same code will work on NVIDIA, AMD, and Intel GPUs.

DirectCompute has everything to do with parallel computing. That is the reason why it was authored.
ZubasaDX 11 includes Direct Compute 5.0.
DC4.0 and 4.1 are avalible to DX10 and 10.1 hardware respectively.
Yes, and it has gone largely unuses, and will continue to go unused due to its inflexibility compared to CUDA(and Streams).

DX11(and DX10) were not focussed on parallel computing, and don't compete with CUDA/Streams/OpenCL. You can't use DX11/DirectCompute to do the things that are possible with CUDA/Streams/OpenCL.
Posted on Reply
#31
RejZoR
I just want damn OpenCL to become widely used standard so we move forward. This whole thing with CUDA is not getting us anywhere.
Posted on Reply
#32
Agility
newtekie1Yes, and it has gone largely unuses, and will continue to go unused due to its inflexibility compared to CUDA(and Streams).

DX11(and DX10) were not focussed on parallel computing, and don't compete with CUDA/Streams/OpenCL. You can't use DX11/DirectCompute to do the things that are possible with CUDA/Streams/OpenCL.
Wait i don't really get you. In your previous post, you said DX11 had nothing to do with parallel. And from your current post, you're saying DX11 has it but just not focused? I'm confuse.
Posted on Reply
#33
erocker
*
newtekie1Yes, and it has gone largely unuses, and will continue to go unused due to its inflexibility compared to CUDA(and Streams).

DX11(and DX10) were not focussed on parallel computing, and don't compete with CUDA/Streams/OpenCL. You can't use DX11/DirectCompute to do the things that are possible with CUDA/Streams/OpenCL.
Don't worry, Nvidia will still call it CUDA. ;) Like it matters...
Posted on Reply
#34
FordGT90Concept
"I go fast!1!11!1!"
newtekie1Yes, and it has gone largely unuses, and will continue to go unused due to its inflexibility compared to CUDA(and Streams).
It's not officially out until Windows 7 (DX11) launches. DirectCompute is backwards compatible with Windows Vista (DX10/10.1). The only real information we have about is what is presented in this slide show (including its "focus on parallel computing").


And exactly, erocker. Microsoft gets to do the evil laughing and thumb twiddling instead of NVIDIA. :roll:
Posted on Reply
#35
Benetanegia
erockerDon't worry, Nvidia will still call it CUDA. ;) Like it matters...
They call it CUDA, because CUDA is the general computing architecture part behind their chip design. CUDA has never been the programing language, C for CUDA was. It's as if Intel/AMD said C for x86. x86 is the architecture, c is the programing language.

So Nvidia will run DX11 compute through CUDA, OpenCL through CUDA, etc.

CUDA= Compute Unified Device Architecture <- It says it all.

The problem is that the media has mixed things badly, giving the name CUDA to the software, when it's not.
Posted on Reply
#36
FordGT90Concept
"I go fast!1!11!1!"
Stream nor CUDA gets much general consumer use--Microsoft is in a position to change that. This is getting off topic. *shame on me*
Posted on Reply
#37
Benetanegia
All discussion is pointless. Nvidia will have DX11, OpenGL and c for CUDA, the three and one will not interfere with the other. CUDA is going to be used in industrial and scientific areas, because it really is much much better for that kind of things, mainly because it's a high level language programing, while the other two are low/medium. Also in those places the ability to run on every GPU is not an issue as they want the best optimization posible for the computer they have just built. They already make different optimizations depending if they use Opterons or Xeons.

CUDA is going to be used in a supercomputer so that alone means a lot of cards sold. I don't know the number, but it could mean 20.000 Tesla cards. At $4000 each, do the calculations.

In the meantime the consumer market will not be affected at all. G80 and GT200 were already focused on general computing and they did well in gaming. People have nothing to worry about.
Posted on Reply
#38
FordGT90Concept
"I go fast!1!11!1!"
The DirectCompute, OpenCL, and c extension for CUDA are all APIs. The playing field is level.


All super computers still run racks of CPUs (almost 300,000 of them in one :eek:) and very, very few GPUs (no more than display for management). That could change with Larrabee but I doubt it will change before then. Why? CUDA nor Stream are fully programmable: Larrabee is. Make a protein folding driver and you got a protein folding Larrabee card. Make a DX11 driver and you got a gaming Larrabee card. Every card, so long as it has an appropriate driver, is made to order for the task at hand. Throw that on a Nehalem-based Itanium mainframe and you got a giant bucket of kickass computing. :rockout:
Posted on Reply
#39
Benetanegia
FordGT90ConceptThe DirectCompute, OpenCL, and c extension for CUDA are all APIs. The playing field is level.


All super computers still run racks of CPUs (almost 300,000 of them in one :eek:) and very, very few GPUs (no more than display for management). That could change with Larrabee but I doubt it will change before then. Why? CUDA nor Stream are fully programmable: Larrabee is. Make a protein folding driver and you got a protein folding Larrabee card. Make a DX11 driver and you got a gaming Larrabee card. Every card, so long as it has an appropriate driver, is made to order for the task at hand. Throw that on a Nehalem-based Itanium mainframe and you got a giant bucket of kickass computing. :rockout:
You are heavily outdated man. Tesla cards are going to be used by ORNL to make the fastest supercomputer. 10x faster than RoadRunner.

www.dvhardware.net/article38174.html

Or here's another example, instead of creating a 8000 CPU (1000 8p servers) supersomputer they used only 48 servers with Tesla.

wallstreetandtech.com/it-infrastructure/showArticle.jhtml?articleID=220200055&cid=nl_wallstreettech_daily

Or this: www.embedded-computing.com/news/Technology+Partnerships/14323 Cray supercomputer on a desk using Tesla.

You are outdated when it comes to CUDA programability too: www.techpowerup.com/105013/NVIDIA_Introduces_Nexus_The_Industry%E2%80%99s_First_IDE_for_Developers_Working_with_MS_VS.html

Fermi can run C/C++ and Fortran natively, Larrabee lost every bit of advantage it had there and it's nowhere to be found and it will not be until late 2010 or even 2011.
Posted on Reply
#41
FordGT90Concept
"I go fast!1!11!1!"
BenetanegiaYou are heavily outdated man. Tesla cards are going to be used by ORNL to make the fastest supercomputer. 10x faster than RoadRunner.

...
It ain't official until it runs Linpack.

Larrabee is slated for Q2 2010.
Posted on Reply
#42
leonard_222003
I'm getting sick about all the names and possible things they can do but they really can't.
Bottom line , what they can do new now after some time of screaming CUDA and STREAMS :
1.they can encode in video , for professionals it's not an option , not to many options on the encoder and if you start to give it some deinterlacing stuff/filters sutff to do and some huge bitrate the video card will be very limited in what it can do to help , i tried it and it's crap , it's good for youtube and PS3/xbox360 but not freaks who want the best quality
2. folding home is not my thing
3. it can speed up some applications like adobe and sony vegas , again very very limited , cpu is still the best upgrade if you do this kind of things
4. some new decoders are helped by video cards but for people who have at least a dual core it's like "so now it runs partially on my video card? wow , i would've never known"
5. physics in games , it's probably the only thing we actually see it as a real progress from all the bullshit they serve us
6. something i forgot :)
Bottom line is they talked about what the video card can do and how great things will be but time passed and nothing , the encoding which was supposed to be top notch and fast on a gpu and look , a year has passed and it's basic and limited in what you can do , it's so basic that most of us who want better quality never has an option using badaboom/cyberlink espresso , they are crapppp.
I will not stand for more bullshit about "supercomputing" on the GPU , SHOW ME WHAT YOU CAN DO OTHER THAN GRAPHICS THAT COULD CHANGE MY PRIORITIES IN BUYING HARDWARE!!!
Posted on Reply
#44
FordGT90Concept
"I go fast!1!11!1!"
Designed for CUDA means it doesn't benefit from Stream too. The reason why it really hasn't caught on except in a few niches is because support is still spotty at best. Unificiation is required and, at least on Windows, DirectCompute is in a position to do that. The gaming industry wouldn't be where it is today without DirectPlay, Direct3D, and DirectSound from the mid 1990's. History repeats.
Posted on Reply
#45
Benetanegia
FordGT90ConceptIt ain't official until it runs Linpack.

Larrabee is slated for Q2 2010.
If it is 10x faster for what it has been designed, but it can't run Linpack, it still is 10x faster.

@leonard_222003

The GPU architecture has changed, and you are talking about the past implementations. And it's been 2 years since it started. How much do you think it took the x86 CPUs to gain traction and substitute other implementations? More than that. New things get time.
Posted on Reply
#46
leonard_222003
BenetanegiaIf it is 10x faster for what it has been designed, but it can't run Linpack, it still is 10x faster.

@leonard_222003

The GPU architecture has changed, and you are talking about the past implementations. And it's been 2 years since it started. How much do you think it took the x86 CPUs to gain traction and substitute other implementations? More than that. New things get time.
I hope it does man because the hype sorunding this is greater and greater but we see nothing really spectacular.
Also what the man with vreveal shows me , is that all the GPU can do ? that program is so uselles you can't even imagine it before you use it , just try it and then talk.
Posted on Reply
#47
Sihastru
FordGT90ConceptDesigned for CUDA means it doesn't benefit from Stream too. The reason why it really hasn't caught on except in a few niches is because support is still spotty at best. Unificiation is required and, at least on Windows, DirectCompute is in a position to do that. The gaming industry wouldn't be where it is today without DirectPlay, Direct3D, and DirectSound from the mid 1990's. History repeats.
It's interesting that you say that since these days I see nVidia doing a lot of work in this direction while ATI stands by waiting to see what happens. Intel with it's new "is it a plane? is it a bird? is it a GPU?" Larabee project that we know very little about is closer to that while not having an actual product. OpenCL was proposed by Apple (of all the things...) to Intel, ATI and nVidia. It's nothing more then a programming language. Same can be said about the DirectCompute included in DX10/10.1/11...

"Unification"? On Windows? Think bigger, better, more... OpenX. You are now imposing limitation to the idea, not just the actual product. If Windows is what you think about, then your idea is to "help" ATI, not the developers and the users. There is life outside Windows you know. DirectCompute is not the answer... it belongs to Microsoft.

I don't think people realize that CUDA and Stream will be here for ever. Why is that? Because all these so called "open" or "unified" standards run on CUDA/Stream. There is the GPU, then there is CUDA/Stream, then there's everything else. This things are not really APIs, they are just wrappers to the CUDA/Stream APIs.

This is why coding something for CUDA/Stream is more efficient then using OpenCL, DirectCompute or whatever.

So I must point out the obvious, because people also think that it's nVidia's fault that CUDA is used and not OpenCL or whatever. The coders choose to use CUDA. nVidia supports just as well OpenCL or whatever.

Another obvious point, would be that there are far more things using CUDA then there are using Stream. This is not because nVidia is the big bad wolf, it is because it has a pro-active mentality the opposite of ATI's wait and see approach.

ATI is pushing games. DX11. That's it. nVidia is struggling to change the market mentality by pushing GPGPU. And they a paying for it. Everyone has something to say about them, then points out that the competition (namely ATI) is doing things much better... the truth is they are not doing it.

For example PhysX/Havok. ATI doesn't have Havok anymore, they "employed" a shady 3rd rate company to "create" an "open" physics standard. I don't think they intend to complete the project, they just need something "in the works" to compete with PhysX. So it seems that PhysX does matter.
Posted on Reply
#48
Sihastru
leonard_222003I hope it does man because the hype sorunding this is greater and greater but we see nothing really spectacular.
Also what the man with vreveal shows me , is that all the GPU can do ? that program is so uselles you can't even imagine it before you use it , just try it and then talk.
Ofcourse it's useless... you have an ATI card... Joke aside, no that is not "all" the GPU can do. It's what the developers of the application intended to do... take your poor quality, shaky family movies, or old dusty poor res movies and try and make them better. Why is that useless?

Only 5x faster then you'd normally do using an expensive high end CPU. Performance increase is always useless...

It's the intended purpose of the application, it's not a GPU functions' showcase.

Why do people post before thinking?
Posted on Reply
#49
Benetanegia
leonard_222003I hope it does man because the hype sorunding this is greater and greater but we see nothing really spectacular.
Also what the man with vreveal shows me , is that all the GPU can do ? that program is so uselles you can't even imagine it before you use it , just try it and then talk.
I have tried it with movies made with my cellphone and it did wonders. It's not something that I would pay for because I don't usually record with my phone, but it's useful, undoubtely. Some people do use their phones to record videos, so it can be very useful for them.

Why does everybody only care about what is good for them? All this GPU computing is free* right now, you just have to choose the correct program. So if it's free, where's the problem? Nvidia right now sell their cards competitively according to graphics performance, but they offer more. That extra things are not for everybody? And what? Let those people have what they DO want, what they do find useful. TBH, it's as if it bothered you that other people had something that you do not want anyway.

*The programs are not free, but they don't cost more because of that feature, it's a free added feature.

@Sihastru

Wel said.

Regarding CUDA/Stream I think that the best way of explaining it is that DX11 and OpenCL are more like BASIC (programming language) and CUDA/Stream are more like ASSEMBLY language, in the sense that's what the GPUs run natively. But c for CUDA (programming API) is a high level language with direct relation to the low level CUDA (architecture and ISA). CUDA is so good because it's a high level and low level language at the same time, and that is very useful in the HPC market.
Posted on Reply
Add your own comment
Dec 2nd, 2024 00:54 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts