Monday, February 3rd 2020
NVIDIA's Next-Generation "Ampere" GPUs Could Have 18 TeraFLOPs of Compute Performance
NVIDIA will soon launch its next-generation lineup of graphics cards based on a new and improved "Ampere" architecture. With the first Tesla server cards that are a part of the Ampere lineup going inside Indiana University Big Red 200 supercomputer, we now have some potential specifications and information about its compute performance. Thanks to the Twitter user dylan552p(@dylan522p), who did some math about the potential compute performance of the Ampere GPUs based on NextPlatform's report, we discovered that Ampere is potentially going to feature up to 18 TeraFLOPs of FP64 compute performance.
With Big Red 200 supercomputer being based on Cray's Shasta supercomputer building block, it is being deployed in two phases. The first phase is the deployment of 672 dual-socket nodes powered by AMD's EPYC 7742 "Rome" processors. These CPUs provide 3.15 PetaFLOPs of combined FP64 performance. With a total of 8 PetaFLOPs planned to be achieved by the Big Red 200, that leaves just a bit under 5 PetaFLOPs to be had using GPU+CPU enabled system. Considering the configuration of a node that contains one next-generation AMD "Milan" 64 core CPU, and four of NVIDIA's "Ampere" GPUs alongside it. If we take for a fact that Milan boosts FP64 performance by 25% compared to Rome, then the math shows that the 256 GPUs that will be delivered in the second phase of Big Red 200 deployment will feature up to 18 TeraFLOPs of FP64 compute performance. Even if "Milan" doubles the FP64 compute power of "Rome", there will be around 17.6 TeraFLOPs of FP64 performance for the GPU.
Sources:
@dylan522p(Twitter), The Next Platform
With Big Red 200 supercomputer being based on Cray's Shasta supercomputer building block, it is being deployed in two phases. The first phase is the deployment of 672 dual-socket nodes powered by AMD's EPYC 7742 "Rome" processors. These CPUs provide 3.15 PetaFLOPs of combined FP64 performance. With a total of 8 PetaFLOPs planned to be achieved by the Big Red 200, that leaves just a bit under 5 PetaFLOPs to be had using GPU+CPU enabled system. Considering the configuration of a node that contains one next-generation AMD "Milan" 64 core CPU, and four of NVIDIA's "Ampere" GPUs alongside it. If we take for a fact that Milan boosts FP64 performance by 25% compared to Rome, then the math shows that the 256 GPUs that will be delivered in the second phase of Big Red 200 deployment will feature up to 18 TeraFLOPs of FP64 compute performance. Even if "Milan" doubles the FP64 compute power of "Rome", there will be around 17.6 TeraFLOPs of FP64 performance for the GPU.
172 Comments on NVIDIA's Next-Generation "Ampere" GPUs Could Have 18 TeraFLOPs of Compute Performance
Volta is really quite different than the Turing cores.
Vega V20 fights directly with Volta and provides some a lot of competition.
Navi is the compute crippled GeForce/Quadro fighter.
This is the big reason that the APUs use Vega based GFX because for office stuff OpenCL performance is king. V20 also has the same degenerated rendering hardware as Navi. So it can do GFX like Navi but crushes it in compute tasks.
A simple comparison...
Xbox One/S and the PS4 variants use Polaris GPUs and they are in the Fury/Navi line.
The Xbox One X uses a GPU based off of the 290X/Vega branch.
Radeon VII isn't much faster in gaming than a 5700XT, but it's 5.5 times faster in compute. The 5700XT has basically the same compute performance as a 2080Ti.
Edit: I really hope Arcturus brings it, because the sooner CUDA dies out the better. Locking so much important research to a closed system isn't good. OpenCL ftw!
makes sense
at least there's someone else who understands this.
Radeon 5700XT
FP64 (double) performance
609.6 GFLOPS (1:16)
Radeon VII
FP64 (double) performance
3.360 TFLOPS (1:4)
GeForce 2080Ti
FP64 (double) performance
420.2 GFLOPS (1:32)
Quadro RTX 5000 PC Jesus Edition
FP64 (double) performance
348.5 GFLOPS (1:32)
8x 2080ti
5.5x 5700XT
9.6x RTX 5000
Big boy cards what Ampere is actually fighting... Volta and V20
Instinct MI60
FP64 (double) performance
7.373 TFLOPS (1:2)
Quadro GV100
FP64 (double) performance
7.066 TFLOPS (1:2)
Quadro GV100S
FP64 (double) performance
8.177 TFLOPS (1:2)
Corrected
I haven't seen any AMD cards "flood the market" in the mid-range or high-end since the 200/300 series. Even "big hits" like RX 480/580 were outsold 8-10x by GTX 1060, etc.
Also a side note, in "Double-Precision Workloads" the Radeon VII is the fastest GPU on the planet.
5.5x faster than the 5700XT
8x faster than the 2080ti
Turing isn't in the same game as Volta. Ampere will continue the tradition of being beastly compute oriented.
Outstanding is what I get from the article and what's been said. If you think it will be better than 2080Ti it's fine with me. I'm not going to make this assumption.
I will leave it as outstanding. When somebody compares Ryzen launch to what the new RDNA2 will brings, outstanding is what comes to my mind. If this is a marketing scheme we will have to wait and see.
Soon after AMD launched the ZEN CPUs they were developing RDNA1. Soon after strong Ryzen sales and profits they injected much needed R&D into the RTG hence RDNA2.
On top of next gen gaming consoles coming out Christmas 2020 powered by RDNA2. Etc
This is old news already well known. People will underestimate AMDs RDNA2 potential because of how there Radeons performed in comparison to Nvidia GPUs. Without realizing AMD split the Server GPU and the Gamers GPU no longer releasing ONE design to appease every market segment.
If the gaming oriented RX 5700XT is not a positive indication that AMD in caught up to Nvidia then we will have to wait and see how this plays out in 2020. My post is based on FACTS. Is yours ? Nope
I said the odds are that it will be release cause it has been mentioned by Lisa Su, other executives from AMD and lead Radeon managers. Although you didn't say if you mean RDNA or RDNA2. If AMD says it is coming then it is. AMD's 2020 GPU roadmap says, New Navi (RDNA2) in 2020 but if you say it's not then maybe tell AMD this, cause they don't know that.
Dual GPU single cards. The V20 based pro duo is king.
Under $2500 USD... Radeon VII is the fastest.
So yeah this upcoming Tesla card will fight against upcoming AMD MI100 and Intel XE Ponte Vecchio server gpus.
NV already had Tesla cores in the 8000/9000 series...
I refer to the core architecture which your Tesla V100 is running what architecture again...
Architecture: Volta
So 64 cores per SM would be 5376
128 cores per SM would be 10752 cores
Dr. Lisa Su is a smart CEO, give credit where credit is due. Claiming she's lying about a high end graphics card i.e. Big Navi is basically insulting the women. Lmao Double precision workloads.
R7 Double Precision Work Loads
128 CUs
lists.freedesktop.org/archives/amd-gfx/2019-July/036848.html
Despite Steve being so far up NVs ass all he knows how to bench for workstation performance is CUDA based. LoL
I think they are related like brothers. Same parents different outcomes using mostly the same building blocks.
Also it's easier to make them seem different because I see so many people posting this is a direct replacement for Turing, like this is what will be driving the 3080Ti... Parts of it, but this is the nerdy core, Turing and it's successor are more the jock core. Pretty but kinda shit at math.
LoL