Tuesday, November 6th 2018
AMD Unveils World's First 7 nm GPUs - Radeon Instinct MI60, Instinct MI50
AMD today announced the AMD Radeon Instinct MI60 and MI50 accelerators, the world's first 7nm datacenter GPUs, designed to deliver the compute performance required for next-generation deep learning, HPC, cloud computing and rendering applications. Researchers, scientists and developers will use AMD Radeon Instinct accelerators to solve tough and interesting challenges, including large-scale simulations, climate change, computational biology, disease prevention and more.
"Legacy GPU architectures limit IT managers from effectively addressing the constantly evolving demands of processing and analyzing huge datasets for modern cloud datacenter workloads," said David Wang, senior vice president of engineering, Radeon Technologies Group at AMD. "Combining world-class performance and a flexible architecture with a robust software platform and the industry's leading-edge ROCm open software ecosystem, the new AMD Radeon Instinct accelerators provide the critical components needed to solve the most difficult cloud computing challenges today and into the future."The AMD Radeon Instinct MI60 and MI50 accelerators feature flexible mixed-precision capabilities, powered by high-performance compute units that expand the types of workloads these accelerators can address, including a range of HPC and deep learning applications. The new AMD Radeon Instinct MI60 and MI50 accelerators were designed to efficiently process workloads such as rapidly training complex neural networks, delivering higher levels of floating-point performance, greater efficiencies and new features for datacenter and departmental deployments.
The AMD Radeon Instinct MI60 and MI50 accelerators provide ultra-fast floating-point performance and hyper-fast HBM2 (second-generation High-Bandwidth Memory) with up to 1 TB/s memory bandwidth speeds. They are also the first GPUs capable of supporting next-generation PCIe 4.02 interconnect, which is up to 2X faster than other x86 CPU-to-GPU interconnect technologies, and feature AMD Infinity Fabric Link GPU interconnect technology that enables GPU-to-GPU communications that are up to 6X faster than PCIe Gen 3 interconnect speeds.
AMD also announced a new version of the ROCm open software platform for accelerated computing that supports the architectural features of the new accelerators, including optimized deep learning operations (DLOPS) and the AMD Infinity Fabric Link GPU interconnect technology. Designed for scale, ROCm allows customers to deploy high-performance, energy-efficient heterogeneous computing systems in an open environment.
"Google believes that open source is good for everyone," said Rajat Monga, engineering director, TensorFlow, Google. "We've seen how helpful it can be to open source machine learning technology, and we're glad to see AMD embracing it. With the ROCm open software platform, TensorFlow users will benefit from GPU acceleration and a more robust open source machine learning ecosystem."
Key features of the AMD Radeon Instinct MI60 and MI50 accelerators include:
AMD today also announced a new version of its ROCm open software platform designed to speed development of high-performance, energy-efficient heterogeneous computing systems. In addition to support for the new Radeon Instinct accelerators, ROCm software version 2.0 provides updated math libraries for the new DLOPS; support for 64-bit Linux operating systems including CentOS, RHEL and Ubuntu; optimizations of existing components; and support for the latest versions of the most popular deep learning frameworks, including TensorFlow 1.11, PyTorch (Caffe) and others. Learn more about ROCm 2.0 software here.
Availability
The AMD Radeon Instinct MI60 accelerator is expected to ship to datacenter customers by the end of 2018. The AMD Radeon Instinct MI50 accelerator is expected to begin shipping to data center customers by the end of Q1 2019. The ROCm 2.0 open software platform is expected to be available by the end of 2018.
Sources:
Radeon Instinct MI60, Radeon Instinct MI50
"Legacy GPU architectures limit IT managers from effectively addressing the constantly evolving demands of processing and analyzing huge datasets for modern cloud datacenter workloads," said David Wang, senior vice president of engineering, Radeon Technologies Group at AMD. "Combining world-class performance and a flexible architecture with a robust software platform and the industry's leading-edge ROCm open software ecosystem, the new AMD Radeon Instinct accelerators provide the critical components needed to solve the most difficult cloud computing challenges today and into the future."The AMD Radeon Instinct MI60 and MI50 accelerators feature flexible mixed-precision capabilities, powered by high-performance compute units that expand the types of workloads these accelerators can address, including a range of HPC and deep learning applications. The new AMD Radeon Instinct MI60 and MI50 accelerators were designed to efficiently process workloads such as rapidly training complex neural networks, delivering higher levels of floating-point performance, greater efficiencies and new features for datacenter and departmental deployments.
The AMD Radeon Instinct MI60 and MI50 accelerators provide ultra-fast floating-point performance and hyper-fast HBM2 (second-generation High-Bandwidth Memory) with up to 1 TB/s memory bandwidth speeds. They are also the first GPUs capable of supporting next-generation PCIe 4.02 interconnect, which is up to 2X faster than other x86 CPU-to-GPU interconnect technologies, and feature AMD Infinity Fabric Link GPU interconnect technology that enables GPU-to-GPU communications that are up to 6X faster than PCIe Gen 3 interconnect speeds.
AMD also announced a new version of the ROCm open software platform for accelerated computing that supports the architectural features of the new accelerators, including optimized deep learning operations (DLOPS) and the AMD Infinity Fabric Link GPU interconnect technology. Designed for scale, ROCm allows customers to deploy high-performance, energy-efficient heterogeneous computing systems in an open environment.
"Google believes that open source is good for everyone," said Rajat Monga, engineering director, TensorFlow, Google. "We've seen how helpful it can be to open source machine learning technology, and we're glad to see AMD embracing it. With the ROCm open software platform, TensorFlow users will benefit from GPU acceleration and a more robust open source machine learning ecosystem."
Key features of the AMD Radeon Instinct MI60 and MI50 accelerators include:
- Optimized Deep Learning Operations: Provides flexible mixed-precision FP16, FP32 and INT4/INT8 capabilities to meet growing demand for dynamic and ever-changing workloads, from training complex neural networks to running inference against those trained networks.
- World's Fastest Double Precision PCIe 2 Accelerator5: The AMD Radeon Instinct MI60 is the world's fastest double precision PCIe 4.0 capable accelerator, delivering up to 7.4 TFLOPS peak FP64 performance5 allowing scientists and researchers to more efficiently process HPC applications across a range of industries including life sciences, energy, finance, automotive, aerospace, academics, government, defense and more. The AMD Radeon Instinct MI50 delivers up to 6.7 TFLOPS FP64 peak performance1, while providing an efficient, cost-effective solution for a variety of deep learning workloads, as well as enabling high reuse in Virtual Desktop Infrastructure (VDI), Desktop-as-a-Service (DaaS) and cloud environments.
- Up to 6X Faster Data Transfer: Two Infinity Fabric Links per GPU deliver up to 200 GB/s of peer-to-peer bandwidth - up to 6X faster than PCIe 3.0 alone4 - and enable the connection of up to 4 GPUs in a hive ring configuration (2 hives in 8 GPU servers).
- Ultra-Fast HBM2 Memory: The AMD Radeon Instinct MI60 provides 32GB of HBM2 Error-correcting code (ECC) memory6, and the Radeon Instinct MI50 provides 16GB of HBM2 ECC memory. Both GPUs provide full-chip ECC and Reliability, Accessibility and Serviceability (RAS)7 technologies, which are critical to deliver more accurate compute results for large-scale HPC deployments.
- Secure Virtualized Workload Support: AMD MxGPU Technology, the industry's only hardware-based GPU virtualization solution, which is based on the industry-standard SR-IOV (Single Root I/O Virtualization) technology, makes it difficult for hackers to attack at the hardware level, helping provide security for virtualized cloud deployments.
AMD today also announced a new version of its ROCm open software platform designed to speed development of high-performance, energy-efficient heterogeneous computing systems. In addition to support for the new Radeon Instinct accelerators, ROCm software version 2.0 provides updated math libraries for the new DLOPS; support for 64-bit Linux operating systems including CentOS, RHEL and Ubuntu; optimizations of existing components; and support for the latest versions of the most popular deep learning frameworks, including TensorFlow 1.11, PyTorch (Caffe) and others. Learn more about ROCm 2.0 software here.
Availability
The AMD Radeon Instinct MI60 accelerator is expected to ship to datacenter customers by the end of 2018. The AMD Radeon Instinct MI50 accelerator is expected to begin shipping to data center customers by the end of Q1 2019. The ROCm 2.0 open software platform is expected to be available by the end of 2018.
45 Comments on AMD Unveils World's First 7 nm GPUs - Radeon Instinct MI60, Instinct MI50
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.
BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.
Nvidia T4 = 242Gflop FP64
So this is equal to a V100 or 32X faster in FP64 than a T4.
My three 7950's had around 3 TFLOPS put together. :laugh:
Other than FP64, seems to be a straightforward dieshrink of Vega 10. Main difference for the GPU itself is 20% higher peak clock - 1800MHz on MI60 instead of 1500MHz on MI25. AMD's quoted performance difference is also 20% which matches specs exactly. Twice the memory on twice as large bus is the other difference.
FP64 being 1:2 FP32 is new, Vega10 did not have that.
Btw, 2080Ti's 13.4 TFLOPs is at specced boost clock 1545 MHz... These usually boost more than that. Vegas so far tend to boost less than peak clock. We will have to wait and see how Vega20 behaves.
Quadro RTX 16.3 is Quadro RTX 6000 number at 1770 MHz which is probably a more realistic clock speed.
So same power draw with a large clock speed bump going by FP32 and FP16 results in about a 17% uplift in theoretical performance.
Judging how similar the Vega 64 / Frontier edition is to the MI25 you can in theory apply 17% to those GPUs and that would be the perfect 100% scaling best case scenario for a 7nm Vega consumer GPU. In that best case scenario a 7nm VEGA consumer card would likely result in performance on par with a stock 1080 Ti / 2070 maybe a 2080 in AMD centric games.
That performance is not really good enough. So I would expect AMD to skip Vega 7nm for NAVI on the consumer front.
AMD is trying really hard in the server market, but I don't think 2018 will be their year to take any crown, and neither will 2019. Maybe 2020 if they keep up with Zen2 and Navi is impressive. But that will also require thousands of hours to write the tools to make their supposed cards faster, or to make the same speed cards as fast and easy to use, which if Lisa is in the know she will already have people working on, but if not we will know the reason why they fail. Given AMD's vaporware issues, where they build hardware for software that isn't ready, or software that has great implementation of either ease of use, speed, or functionality, and you can only choose one......
I think AMD is playing their cards right for the midsize guys where a few IT guys run the show and want to save thousands to put into software development for long life peak performance, they will survive and their prosumer, gaming and server business will work out in the end, they will never be as big as Nvidia or Intel though. The same reason the Ford F-150 sells so many shitty trucks, its the king, its the classic standard of equal to the neighbors. AMD is the ful featured but still slightly odd Holden, the loud and hot Corvette versus the supercars, its second dog to the CPU and GPU business mostly due to mismanagement, Im just glad they are here to keep us from paying thousands more that Intel and Nvidia would charge if they could.
Because the 2070 is around 1080 Non-Ti level.
But one thing that AMD does deserve credit for is that for the past couple of years they have been doing an excellent job executing their moves, and seem to be headed in the right direction.
Conveniently, their comparisons are marketing material worthy. For example, RESNET-50 Training V100's 357 vs MI60's 334 (images per second) where MI60 has "comparable performance". I wonder what could a GPU do if it had spent some die space to add dedicated hardware units for something like that. Lets call these hardware units, say, Tensor Cores? Nvidia's RESNET-50 Training numbers for V100 are in the same range for CUDA cores and 1000-ish on Tensor Cores :D
But back to the point, it's 337mm² 7nm chip vs 815mm² 12nm chip, both have similar TDP, there is nothing to be sad about. Yeah. For something. But apparently, that's not much die space, GV100 has 1.4 times more CUDA cores, with 33% bigger die (and only a tiny bit improved process):
Yeah, brought to you by "1060 is muh faster than 480". Actual tests show something like this: