Tuesday, May 7th 2019
AMD Collaborates with US DOE to Deliver the Frontier Supercomputer
The U.S. Department of Energy today announced a contract with Cray Inc. to build the Frontier supercomputer at Oak Ridge National Laboratory, which is anticipated to debut in 2021 as the world's most powerful computer with a performance of greater than 1.5 exaflops.
Scheduled for delivery in 2021, Frontier will accelerate innovation in science and technology and maintain U.S. leadership in high-performance computing and artificial intelligence. The total contract award is valued at more than $600 million for the system and technology development. The system will be based on Cray's new Shasta architecture and Slingshot interconnect and will feature high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology.By solving calculations up to 50 times faster than today's top supercomputers-exceeding a quintillion, or 10^18, calculations per second-Frontier will enable researchers to deliver breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. As a second-generation AI system-following the world-leading Summit system deployed at ORNL in 2018-Frontier will provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health.
"Frontier's record-breaking performance will ensure our country's ability to lead the world in science that improves the lives and economic prosperity of all Americans and the entire world," said U.S. Secretary of Energy Rick Perry. "Frontier will accelerate innovation in AI by giving American researchers world-class data and computing resources to ensure the next great inventions are made in the United States."
Since 2005, Oak Ridge National Laboratory has deployed Jaguar, Titan, and Summit, each the world's fastest computer in its time. The combination of traditional processors with graphics processing units to accelerate the performance of leadership-class scientific supercomputers is an approach pioneered by ORNL and its partners and successfully demonstrated through ORNL's No.1 ranked Titan and Summit supercomputers.
"ORNL's vision is to sustain the nation's preeminence in science and technology by developing and deploying leadership computing for research and innovation at an unprecedented scale," said ORNL Director Thomas Zacharia. "Frontier follows the well-established computing path charted by ORNL and its partners that will provide the research community with an exascale system ready for science on day one."
Researchers with DOE's Exascale Computing Project are developing exascale scientific applications today on ORNL's 200-petaflop Summit system and will seamlessly transition their scientific applications to Frontier in 2021. In addition, the lab's Center for Accelerated Application Readiness is now accepting proposals from scientists to prepare their codes to run on Frontier.Researchers will harness Frontier's powerful architecture to advance science in such applications as systems biology, materials science, energy production, additive manufacturing and health data science. Visit the Frontier website to learn more about what researchers plan to accomplish in these and other scientific fields.
Frontier will offer best-in-class traditional scientific modeling and simulation capabilities while also leading the world in artificial intelligence and data analytics. Closely integrating artificial intelligence with data analytics and modeling and simulation will drastically reduce the time to discovery by automatically recognizing patterns in data and guiding simulations beyond the limits of traditional approaches.
"We are honored to be part of this historic moment as we embark on supporting extreme-scale scientific endeavors to deliver the next U.S. exascale supercomputer to the Department of Energy and ORNL," said Peter Ungaro, president and CEO of Cray. "Frontier will incorporate foundational new technologies from Cray and AMD that will enable the new exascale era-characterized by data-intensive workloads and the convergence of modeling, simulation, analytics, and AI for scientific discovery, engineering and digital transformation."
Frontier will incorporate several novel technologies co-designed specifically to deliver a balanced scientific capability for the user community. The system will be composed of more than 100 Cray Shasta cabinets with high density compute blades powered by HPC and AI- optimized AMD EPYC processors and Radeon Instinct GPU accelerators purpose-built for the needs of exascale computing. The new accelerator-centric compute blades will support a 4:1 GPU to CPU ratio with high speed AMD Infinity Fabric links and coherent memory between them within the node. Each node will have one Cray Slingshot interconnect network port for every GPU with streamlined communication between the GPUs and network to enable optimal performance for high-performance computing and AI workloads at exascale.
To make this performance seamless to consume by developers, Cray and AMD are co-designing and developing enhanced GPU programming tools optimized for performance, productivity and portability. This will include new capabilities in the Cray Programming Environment and AMD's ROCm open compute platform that will be integrated together into the Cray Shasta software stack for Frontier.
"AMD is proud to be working with Cray, Oak Ridge National Laboratory and the Department of Energy to push the boundaries of high performance computing with Frontier," said Lisa Su, AMD president and CEO. "Today's announcement represents the power of collaboration between private industry and public research institutions to deliver groundbreaking innovations that scientists can use to solve some of the world's biggest problems."
Frontier leverages a decade of exascale technology investments by DOE. The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system, and multi-year systems support. The Frontier system is expected to be delivered in 2021, and acceptance is anticipated in 2022.
Frontier will be part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility. ORNL is managed by UT-Battelle for DOE's Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE's Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit DOE's webiste.
Scheduled for delivery in 2021, Frontier will accelerate innovation in science and technology and maintain U.S. leadership in high-performance computing and artificial intelligence. The total contract award is valued at more than $600 million for the system and technology development. The system will be based on Cray's new Shasta architecture and Slingshot interconnect and will feature high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology.By solving calculations up to 50 times faster than today's top supercomputers-exceeding a quintillion, or 10^18, calculations per second-Frontier will enable researchers to deliver breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. As a second-generation AI system-following the world-leading Summit system deployed at ORNL in 2018-Frontier will provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health.
"Frontier's record-breaking performance will ensure our country's ability to lead the world in science that improves the lives and economic prosperity of all Americans and the entire world," said U.S. Secretary of Energy Rick Perry. "Frontier will accelerate innovation in AI by giving American researchers world-class data and computing resources to ensure the next great inventions are made in the United States."
Since 2005, Oak Ridge National Laboratory has deployed Jaguar, Titan, and Summit, each the world's fastest computer in its time. The combination of traditional processors with graphics processing units to accelerate the performance of leadership-class scientific supercomputers is an approach pioneered by ORNL and its partners and successfully demonstrated through ORNL's No.1 ranked Titan and Summit supercomputers.
"ORNL's vision is to sustain the nation's preeminence in science and technology by developing and deploying leadership computing for research and innovation at an unprecedented scale," said ORNL Director Thomas Zacharia. "Frontier follows the well-established computing path charted by ORNL and its partners that will provide the research community with an exascale system ready for science on day one."
Researchers with DOE's Exascale Computing Project are developing exascale scientific applications today on ORNL's 200-petaflop Summit system and will seamlessly transition their scientific applications to Frontier in 2021. In addition, the lab's Center for Accelerated Application Readiness is now accepting proposals from scientists to prepare their codes to run on Frontier.Researchers will harness Frontier's powerful architecture to advance science in such applications as systems biology, materials science, energy production, additive manufacturing and health data science. Visit the Frontier website to learn more about what researchers plan to accomplish in these and other scientific fields.
Frontier will offer best-in-class traditional scientific modeling and simulation capabilities while also leading the world in artificial intelligence and data analytics. Closely integrating artificial intelligence with data analytics and modeling and simulation will drastically reduce the time to discovery by automatically recognizing patterns in data and guiding simulations beyond the limits of traditional approaches.
"We are honored to be part of this historic moment as we embark on supporting extreme-scale scientific endeavors to deliver the next U.S. exascale supercomputer to the Department of Energy and ORNL," said Peter Ungaro, president and CEO of Cray. "Frontier will incorporate foundational new technologies from Cray and AMD that will enable the new exascale era-characterized by data-intensive workloads and the convergence of modeling, simulation, analytics, and AI for scientific discovery, engineering and digital transformation."
Frontier will incorporate several novel technologies co-designed specifically to deliver a balanced scientific capability for the user community. The system will be composed of more than 100 Cray Shasta cabinets with high density compute blades powered by HPC and AI- optimized AMD EPYC processors and Radeon Instinct GPU accelerators purpose-built for the needs of exascale computing. The new accelerator-centric compute blades will support a 4:1 GPU to CPU ratio with high speed AMD Infinity Fabric links and coherent memory between them within the node. Each node will have one Cray Slingshot interconnect network port for every GPU with streamlined communication between the GPUs and network to enable optimal performance for high-performance computing and AI workloads at exascale.
To make this performance seamless to consume by developers, Cray and AMD are co-designing and developing enhanced GPU programming tools optimized for performance, productivity and portability. This will include new capabilities in the Cray Programming Environment and AMD's ROCm open compute platform that will be integrated together into the Cray Shasta software stack for Frontier.
"AMD is proud to be working with Cray, Oak Ridge National Laboratory and the Department of Energy to push the boundaries of high performance computing with Frontier," said Lisa Su, AMD president and CEO. "Today's announcement represents the power of collaboration between private industry and public research institutions to deliver groundbreaking innovations that scientists can use to solve some of the world's biggest problems."
Frontier leverages a decade of exascale technology investments by DOE. The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system, and multi-year systems support. The Frontier system is expected to be delivered in 2021, and acceptance is anticipated in 2022.
Frontier will be part of the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility. ORNL is managed by UT-Battelle for DOE's Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE's Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit DOE's webiste.
47 Comments on AMD Collaborates with US DOE to Deliver the Frontier Supercomputer
Also, we're talking about an HPC system. Cray delivers the whole package: configured and ready to run.
It's way more cost effective as well.
ROCm has been lagging behind 2-3 versions in support but they have been catching up quite well to CUDA's feature set.
If you want to do workloads that rely heavily on tensor ops, then you go with V100s, if you need simple double, single or half precision than AMD is a solid option.
If you need just need inferencing... shiiiit options are wide open.
Edit: Shit, they have caught up on version support... rocm.github.io/dl.html
As I said, I can't be certain. But I think their parts choices point down that road... And your naive if you think calling me naive constitutes an argument. And the wright brothers invented the airplane. Who cares who follows who in relation to marketshare? Furthermore, call me when their system is even in the TOP100 charts.
This will be #1 and 7x faster than the current #1 and 50% faster than Intel's system that may or may not get finished first.
I think this is the first time I have seen AMD cpu+gpu be in the top 10, everything before has been AMD CPU/Nvidia GPU.
Seriously, every single one of those wind turbines has to be paired up with a quick-reacting gas turbine, ready to pick up the load. Spinning reserve. Those are expensive to buy, to maintain and they are not fuel efficient compared to a slow reacting coal or nuclear plants (those cannot be used as spinning reserves because cannot be turned on-off so fast).
So by using the "free" wind, you just increased the price of electricity... What happen, did they finally fired you? :D
Just joking, sorry, could not help, was so easy...
But I am also curious about the future upgradebilitty for those server farms. Can the CPUs/GPUs be easily upgraded in the future on the fly? And I mean without changing motherboards and such?
In every microsecond of the day, 24/7, 365 days/year, the amount of electricity produced has to be equal to the electricity consumed. If a power generator drops quickly, somewhere in the system, preferably close by, another one has to pick up as quickly that exact deficit of power.
Steam turbines (coal fired) have a spare capacity of a few minutes of steam, and by that time, the gas turbines have to start-up already to pick up the slack.
You can't mess up with a nuclear power plant up and down that way. Or... you can, but might end up with Chernobyl.
Wrong and so off topic, I'm leaving it there.
After Watching this about Milan and the possibility that it will include 15 Chiplets and maybe SMT4, I think [my Imagination] I Have an Idea where AMD is going with it's future(Maybe Custom design?) HPC EPYC design on 7nm+:
1)Each CPU chiplet will be 6C/24T to save space/power while giving similar or better then 8c/16t performance.
2)Adding 4 custom Instinct GPU chiplets.
3)Adding 2 custom AI accelerator [Asics] chiplets.
4)1 I/O chiplet with HBM memory stack.
So the final EPYC Milan(?) can be HPC beast with:
EDIT: I see that there was already great article on such HPC APU design:
www.overclock.net/forum/225-...lops-200w.html
www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf
You can see the EPYC PCB design in "Figure 2. Exascale Heterogeneous Processor (EHP) ".
So after reading some of it I changed my illustration:
No CPU Chiplet ontop of I/O:[Took Vega Pro 20CU image and placed the HBM on top and shrank it to 7nm+ level]
IMO the GPU's could take ~150W + 75W~100W rest of the CPU+I/O= around 250W TDP.
Or CPU Chiplets ontop of I/O- it can still be 14nm or 7nm- but it gonna stay large chiplet anyway to place cpu chiplets on top,
And 8 Milans could be installed in Cray’s Shasta 1U with Direct Liquid Cooling:
www.anandtech.com/show/13616...liquid-cooling