News Posts matching #Exascale

Return to Keyword Browsing

AMD Powers El Capitan: The World's Fastest Supercomputer

Today, AMD showcased its ongoing high performance computing (HPC) leadership at Supercomputing 2024 by powering the world's fastest supercomputer for the sixth straight Top 500 list.

The El Capitan supercomputer, housed at Lawrence Livermore National Laboratory (LLNL), powered by AMD Instinct MI300A APUs and built by Hewlett Packard Enterprise (HPE), is now the fastest supercomputer in the world with a High-Performance Linpack (HPL) score of 1.742 exaflops based on the latest Top 500 list. Both El Capitan and the Frontier system at Oak Ridge National Lab claimed numbers 18 and 22, respectively, on the Green 500 list, showcasing the impressive capabilities of the AMD EPYC processors and AMD Instinct GPUs to drive leadership performance and energy efficiency for HPC workloads.

TOP500: Frontier Keeps Top Spot, Aurora Officially Becomes the Second Exascale Machine

The 63rd edition of the TOP500 reveals that Frontier has once again claimed the top spot, despite no longer being the only exascale machine on the list. Additionally, a new system has found its way into the Top 10.

The Frontier system at Oak Ridge National Laboratory in Tennessee, USA remains the most powerful system on the list with an HPL score of 1.206 EFlop/s. The system has a total of 8,699,904 combined CPU and GPU cores, an HPE Cray EX architecture that combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI with AMD Instinct MI250X accelerators, and it relies on Cray's Slingshot 11 network for data transfer. On top of that, this machine has an impressive power efficiency rating of 52.93 GFlops/Watt - putting Frontier at the No. 13 spot on the GREEN500.

The SEA Projects Prepare Europe for Exascale Supercomputing

The HPC research projects DEEP-SEA, IO-SEA and RED-SEA are wrapping up this month after a three-year project term. The three projects worked together to develop key technologies for European Exascale supercomputers, based on the Modular Supercomputing Architecture (MSA), a blueprint architecture for highly efficient and scalable heterogeneous Exascale HPC systems. To achieve this, the three projects collaborated on system software and programming environments, data management and storage, as well as interconnects adapted to this architecture. The results of their joint work will be presented at a co-design workshop and poster session at the EuroHPC Summit (Antwerp, 18-21 March, www.eurohpcsummit.eu).

AMD Hires Thomas Zacharia to Expand Strategic AI Relationships

AMD announced that Thomas Zacharia has joined AMD as senior vice president of strategic technology partnerships and public policy. Zacharia will lead the global expansion of AMD public/private relationships with governments, non-governmental organizations (NGOs) and other organizations to help fast-track the deployment of customized AMD-powered AI solutions to meet rapidly growing number of global projects and applications targeting the deployment of AI for the public good.

"Thomas is a distinguished leader with decades of experience successfully creating public/private partnerships that have resulted in consistently deploying the world's most powerful and advanced computing solutions, including the world's fastest supercomputer Frontier," said AMD Chair and CEO Lisa Su. "As the former Director of the U.S.'s largest multi-program science and energy research lab, Thomas is uniquely positioned to leverage his extensive experience advancing the frontiers of science and technology to help countries around the world deploy AMD-powered AI solutions for the public good."

Lenovo HPC Infrastructure Powers Pre-Exascale Supercomputer Marenostrum 5 to Enable New Scientific Advances and Solve Global Challenges

Lenovo (HKSE: 992) (ADR: LNVGY) has today announced that the General Purpose Partition of the MareNostrum 5, a new pre-exascale supercomputer running on Lenovo's HPC infrastructure, has been classified as the top x86 general-purpose cluster on the recently published TOP500 list of the most powerful supercomputers globally.

Officially inaugurated at Barcelona Supercomputing Center on December 21st, MareNostrum 5 has been built for the European High Performance Computing Joint Undertaking (EuroHPC JU). The pre-exascale supercomputer will bolster the EU's mission to provide Europe with the most advanced supercomputing technology and accelerate the capacity for artificial intelligence (AI) research, enabling new scientific advances that will help solve global challenges. It aims to empower a wide range of complex HPC-specific applications, from climate research and engineering to material science and earth sciences, adeptly handling tasks that extend beyond the capabilities of cloud computing.

Chinese Researchers Want to Make Wafer-Scale RISC-V Processors with up to 1,600 Cores

According to the report from a journal called Fundamental Research, researchers from the Institute of Computing Technology at the Chinese Academy of Sciences have developed a 256-core multi-chiplet processor called Zhejiang Big Chip, with plans to scale up to 1,600 cores by utilizing an entire wafer. As transistor density gains slow, alternatives like multi-chiplet architectures become crucial for continued performance growth. The Zhejiang chip combines 16 chiplets, each holding 16 RISC-V cores, interconnected via network-on-chip. This design can theoretically expand to 100 chiplets and 1,600 cores on an advanced 2.5D packaging interposer. While multi-chiplet is common today, using the whole wafer for one system would match Cerebras' breakthrough approach. Built on 22 nm process technology, the researchers cite exascale supercomputing as an ideal application for massively parallel multi-chiplet architectures.

Careful software optimization is required to balance workloads across the system hierarchy. Integrating near-memory processing and 3D stacking could further optimize efficiency. The paper explores lithography and packaging limits, proposing hierarchical chiplet systems as a flexible path to future computing scale. While yield and cooling challenges need further work, the 256-core foundation demonstrates the potential of modular designs as an alternative to monolithic integration. China's focus mirrors multiple initiatives from American giants like AMD and Intel for data center CPUs. But national semiconductor ambitions add urgency to prove domestically designed solutions can rival foreign innovation. Although performance details are unclear, the rapid progress shows promise in mastering modular chip integration. Combined with improving domestic nodes like the 7 nm one from SMIC, China could easily create a viable Exascale system in-house.

TOP500 Update: Frontier Remains No.1 With Aurora Coming in at No. 2

The 62nd edition of the TOP500 reveals that the Frontier system retains its top spot and is still the only exascale machine on the list. However, five new or upgraded systems have shaken up the Top 10.

Housed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, Frontier leads the pack with an HPL score of 1.194 EFlop/s - unchanged from the June 2023 list. Frontier utilizes AMD EPYC 64C 2GHz processors and is based on the latest HPE Cray EX235a architecture. The system has a total of 8,699,904 combined CPU and GPU cores. Additionally, Frontier has an impressive power efficiency rating of 52.59 GFlops/watt and relies on HPE's Slingshot 11 network for data transfer.

Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

The Exascale supercomputer arms race is making everyone invest their resources into trying to achieve the number one spot. Some countries, like China, actively participate in the race with little proof of their work, leaving the high-performance computing (HPC) community wondering about Chinese efforts on exascale systems. Today, we have some information regarding the next-generation Sunway system, which is supposed to be China's first exascale supercomputer. Replacing the Sunway TaihuLight, the next-generation Sunway will reportedly boast over 40 million cores in its system. The information comes from an upcoming presentation for Supercomputing 2023 show in Denver, happening from November 12 to November 17.

The presentation talks about 5 ExaFLOPS in the HPL-MxP benchmark with linear scalability on the 40-million-core Sunway supercomputer. The HPL-MxP benchmark is a mixed precision HPC benchmark made to test the system's capability in regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision. Supposedly, the next-generation Sunway system can output 5 ExaFLOPS with linear scaling on its 40-million-core system. What are those cores? We are not sure. The last-generation Sunway TaihuLight used SW26010 manycore 64-bit RISC processors based on the Sunway architecture, each with 260 cores. There were 40,960 SW26010 CPUs in the system for a total of 10,649,600 cores, which means that the next-generation Sunway system is more than four times more powerful from a core-count perspective. We expect some uArch and semiconductor node improvements as well.

RIKEN and Intel Collaborate on "Road to Exascale"

RIKEN and Intel Corporation (hereafter referred to as Intel) have signed a memorandum of understanding on collaboration and cooperation to accelerate joint research in next-generation computing fields such as AI (artificial intelligence), high-performance computing, and quantum computers. The signing ceremony was concluded on May 18, 2023. As part of this MOU, RIKEN will work with Intel Foundry Services (IFS) to prototype these new solutions.

Frontier Remains As Sole Exaflop Machine on TOP500 List

Increasing its HPL score from 1.02 Eflop/s in November 2022 to an impressive 1.194 Eflop/s on this list, Frontier was able to improve upon its score after a stagnation between June 2022 and November 2022. Considering exascale was only a goal to aspire to just a few years ago, a roughly 17% increase here is an enormous success. Additionally, Frontier earned a score of 9.95 Eflop/s on the HLP-MxP benchmark, which measures performance for mixed-precision calculation. This is also an increase over the 7.94 EFlop/s that the system achieved on the previous list and nearly 10 times more powerful than the machine's HPL score. Frontier is based on the HPE Cray EX235a architecture and utilizes AMD EPYC 64C 2 GHz processors. It also has 8,699,904 cores and an incredible energy efficiency rating of 52.59 Gflops/watt. It also relies on gigabit ethernet for data transfer.

India Homegrown HPC Processor Arrives to Power Nation's Exascale Supercomputer

With more countries creating initiatives to develop homegrown processors capable of powering powerful supercomputing facilities, India has just presented its development milestone with Aum HPC. Thanks to information from the report by The Next Platform, we learn that India has developed a processor for powering its exascale high-performance computing (HPC) system. Called Aum HPC, the CPU was developed by the National Supercomputing Mission of the Indian government, which funded the Indian Institute of Science, the Department of Science and Technology, the Ministry of Electronics and Information Technology, and C-DAC to design and manufacture the Aum HPC processors and create strong, strong technology independence.

The Aum HPC is based on Armv8.4 CPU ISA and represents a chiplet processor. Each compute chiplet features 48 Arm Zeus Cores based on Neoverse V1 IP, so with two chiplets, the processor has 96 cores in total. Each core gets 1 MB of level two cache and 1 MB of system cache, for 96 MB L2 cache and 96 MB system cache in total. For memory, the processor uses 16-channel 32-bit DDR5-5200 with a bandwidth of 332.8 GB/s. To expand on that, HBM memory is present, and there is 64 GB of HBM3 with four controllers capable of achieving a bandwidth of 2.87 TB/s. As far as connectivity, the Aum HPC processor has 64 PCIe Gen 5 Lanes with CXL enabled. It is manufactured on a 5 nm node from TSMC. With a 3.0 GHz typical and 3.5+ GHz turbo frequency, the Aum HPC processor is rated for a TDP of 300 Watts. It is capable of producing 4.6+ TeraFLOPS per socket. Below are illustrations and tables comparing Aum HPC to Fujitsy A64FX, another Arm HPC-focused design.

UK Government Seeks to Invest £900 Million in Supercomputer, Native Research into Advanced AI Deemed Essential

The UK Treasury has set aside a budget of £900 million to invest in the development of a supercomputer that would be powerful enough to chew through more than one billion billion simple calculations a second. A new exascale computer would fit the bill, for utilization by newly established advanced AI research bodies. It is speculated that one key goal is to establish a "BritGPT" system. The British government has been keeping tabs on recent breakthroughs in large language models, the most notable example being OpenAI's ChatGPT. Ambitions to match such efforts were revealed in a statement, with the emphasis: "to advance UK sovereign capability in foundation models, including large language models."

The current roster of United Kingdom-based supercomputers looks to be unfit for the task of training complex AI models. In light of being outpaced by drives in other countries to ramp up supercomputer budgets, the UK Government outlined its own future investments: "Because AI needs computing horsepower, I today commit around £900 million of funding, for an exascale supercomputer," said the chancellor, Jeremy Hunt. The government has declared that quantum technologies will receive an investment of £2.5 billion over the next decade. Proponents of the technology have declared that it will supercharge machine learning.

AMD Shows Instinct MI300 Exascale APU with 146 Billion Transistors

During its CES 2023 keynote, AMD announced its latest Instinct MI300 APU, a first of its kind in the data center world. Combining the CPU, GPU, and memory elements into a single package eliminates latency imposed by long travel distances of data from CPU to memory and from CPU to GPU throughout the PCIe connector. In addition to solving some latency issues, less power is needed to move the data and provide greater efficiency. The Instinct MI300 features 24 Zen4 cores with simultaneous multi-threading enabled, CDNA3 GPU IP, and 128 GB of HBM3 memory on a single package. The memory bus is 8192-bit wide, providing unified memory access for CPU and GPU cores. CLX 3.0 is also supported, making cache-coherent interconnecting a reality.

The Instinct MI300 APU package is an engineering marvel of its own, with advanced chiplet techniques used. AMD managed to do 3D stacking and has nine 5 nm logic chiplets that are 3D stacked on top of four 6 nm chiplets with HBM surrounding it. All of this makes the transistor count go up to 146 billion, representing the sheer complexity of a such design. For performance figures, AMD provided a comparison to Instinct MI250X GPU. In raw AI performance, the MI300 features an 8x improvement over MI250X, while the performance-per-watt is "reduced" to a 5x increase. While we do not know what benchmark applications were used, there is a probability that some standard benchmarks like MLPerf were used. For availability, AMD targets the end of 2023, when the "El Capitan" exascale supercomputer will arrive using these Instinct MI300 APU accelerators. Pricing is unknown and will be unveiled to enterprise customers first around launch.

ORNL's Exaflop Machine Frontier Keeps Top Spot, New Competitor Leonardo Breaks the Top10 List

The 60th edition of the TOP500 reveals that the Frontier system is still the only true exascale machine on the list.

With an HPL score of 1.102 EFlop/s, the Frontier machine at Oak Ridge National Laboratory (ORNL) did not improve upon the score it reached on the June 2022 list. That said, Frontier's near-tripling of the HPL score received by second-place winner is still a major victory for computer science. On top of that, Frontier demonstrated a score of 7.94 EFlop/s on the HPL-MxP benchmark, which measures performance for mixed-precision calculation. Frontier is based on the HPE Cray EX235a architecture and it relies on AMD EPYC 64C 2 GHz processor. The system has 8,730,112 cores and a power efficiency rating of 52.23 gigaflops/watt. It also relies on gigabit ethernet for data transfer.

SiPearl and AMD Join Forces to Develop European Exascale Systems

SiPearl, the company designing the highperformance, low-power microprocessor for European supercomputers, has entered into a business collaboration agreement with AMD to provide a joint offering for exascale supercomputing systems, combining SiPearl's HPC microprocessor, Rhea, with AMD Instinct accelerators.

Initially, AMD and SiPearl will jointly assess the interoperability of the AMD ROCm open software with the SiPearl Rhea microprocessor and build an optimized software solution that would strengthen the capabilities of a SiPearl microprocessor combined with an AMD Instinct accelerator. This joint work targets porting and optimization activities of the AMD HIP backend, openMP compilers and libraries, will enable scientific applications to benefit from both technologies.

AMD Instinct MI300 APU to Power El Capitan Exascale Supercomputer

The Exascale supercomputing race is now well underway, as the US-based Frontier supercomputer got delivered, and now we wait to see the remaining systems join the race. Today, during 79th HPC User Forum at Oak Ridge National Laboratory (ORNL), Terri Quinn at Lawrence Livermore National Laboratory (LLNL) delivered a few insights into what El Capitan exascale machine will look like. And it seems like the new powerhouse will be based on AMD's Instinct MI300 APU. LLNL targets peak performance of over two exaFLOPs and a sustained performance of more than one exaFLOP, under 40 megawatts of power. This should require a very dense and efficient computing solution, just like the MI300 APU is.

As a reminder, the AMD Instinct MI300 is an APU that combines Zen 4 x86-64 CPU cores, CDNA3 compute-oriented graphics, large cache structures, and HBM memory used as DRAM on a single package. This is achieved using a multi-chip module design with 2.5D and 3D chiplet integration using Infinity architecture. The system will essentially utilize thousands of these APUs to become one large Linux cluster. It is slated for installation in 2023, with an operating lifespan from 2024 to 2030.

EuroHPC Joint Undertaking Announces Five Sites to Host new World-Class Supercomputers

JUPITER, the first European exascale supercomputer, will be hosted by the Jülich Supercomputing Centre in Germany. Exascale supercomputers are systems capable of performing more than a billion billion calculations per second and represent a significant milestone for Europe. By supporting the development of high-precision models of complex systems, they will have a major impact on European scientific excellence.

Intel Announces "Rialto Bridge" Accelerated AI and HPC Processor

During the International Supercomputing Conference on May 31, 2022, in Hamburg, Germany, Jeff McVeigh, vice president and general manager of the Super Compute Group at Intel Corporation, announced Rialto Bridge, Intel's data center graphics processing unit (GPU). Using the same architecture as the Intel data center GPU Ponte Vecchio and combining enhanced tiles with Intel's next process node, Rialto Bridge will offer up to 160 Xe cores, more FLOPs, more I/O bandwidth and higher TDP limits for significantly increased density, performance and efficiency.

"As we embark on the exascale era and sprint towards zettascale, the technology industry's contribution to global carbon emissions is also growing. It has been estimated that by 2030, between 3% and 7% of global energy production will be consumed by data centers, with computing infrastructure being a top driver of new electricity use," said Jeff McVeigh, vice president and general manager of the Super Compute Group at Intel Corporation.

Alleged AMD Instinct MI300 Exascale APU Features Zen4 CPU and CDNA3 GPU

Today we got information that AMD's upcoming Instinct MI300 will be allegedly available as an Accelerated Processing Unit (APU). AMD APUs are processors that combine CPU and GPU into a single package. AdoredTV managed to get ahold of a slide that indicates that AMD Instinct MI300 accelerator will also come as an APU option that combines Zen4 CPU cores and CDNA3 GPU accelerator in a single, large package. With technologies like 3D stacking, MCM design, and HBM memory, these Instinct APUs are positioned to be a high-density compute the product. At least six HBM dies are going to be placed in a package, with the APU itself being a socketed design.

The leaked slide from AdoredTV indicates that the first tapeout is complete by the end of the month (presumably this month), with the first silicon hitting AMD's labs in Q3 of 2022. If the silicon turns out functional, we could see these APUs available sometime in the first half of 2023. Below, you can see an illustration of the AMD Instinct MI300 GPU. The APU version will potentially be of the same size with Zen4 and CDNA3 cores spread around the package. As Instinct MI300 accelerator is supposed to use eight compute tiles, we could see different combinations of CPU/GPU tiles offered. As we await the launch of the next-generation accelerators, we are yet to see what SKUs AMD will bring.

AMD Introduces Instinct MI210 Data Center Accelerator for Exascale-class HPC and AI in a PCIe Form-Factor

AMD today announced a new addition to the Instinct MI200 family of accelerators. Officially titled Instinct MI210 accelerator, AMD tries to bring exascale-class technologies to mainstream HPC and AI customers with this model. Based on CDNA2 compute architecture built for heavy HPC and AI workloads, the card features 104 compute units (CUs), totaling 6656 Streaming Processors (SPs). With a peak engine clock of 1700 MHz, the card can output 181 TeraFLOPs of FP16 half-precision peak compute, 22.6 TeraFLOPs peak FP32 single-precision, and 22.6 TFLOPs peak FP62 double-precision compute. For single-precision matrix (FP32) compute, the card can deliver a peak of 45.3 TFLOPs. The INT4/INT8 precision settings provide 181 TOPs, while MI210 can compute the bfloat16 precision format with 181 TeraFLOPs at peak.

The card uses a 4096-bit memory interface connecting 64 GBs of HMB2e to the compute silicon. The total memory bandwidth is 1638.4 GB/s, while memory modules run at a 1.6 GHz frequency. It is important to note that the ECC is supported on the entire chip. AMD provides an Instinct MI210 accelerator as a PCIe solution, based on a PCIe 4.0 standard. The card is rated for a TDP of 300 Watts and is cooled passively. There are three infinity fabric links enabled, and the maximum bandwidth of the infinity fabric link is 100 GB/s. Pricing is unknown; however, availability is March 22nd, which is the immediate launch date.

AMD places this card directly aiming at NVIDIA A100 80 GB accelerator as far as the targeted segment, with emphasis on half-precision and INT4/INT8 heavy applications.

EuroHPC Joint Undertaking Launches Three New Research and Innovation Projects

The European High Performance Computing Joint Undertaking (EuroHPC JU) has launched 3 new research and innovation projects. The projects aim to bring the EU and its partners in the EuroHPC JU closer to developing independent microprocessor and HPC technology and advance a sovereign European HPC ecosystem. The European Processor Initiative (EPI SGA2), The European PILOT and the European Pilot for Exascale (EUPEX) are interlinked projects and an important milestone towards a more autonomous European supply chain for digital technologies and specifically HPC.

With joint investments of €140 million from the European Union (EU) and the EuroHPC JU Participating States, the three projects will carry out research and innovation activities to contribute to the overarching goal of securing European autonomy and sovereignty in HPC components and technologies, especially in anticipation of the European exascale supercomputers.

TOP500 Update Shows No Exascale Yet, Japanese Fugaku Supercomputer Still at the Top

The 58th annual edition of the TOP500 saw little change in the Top10. The Microsoft Azure system called Voyager-EUS2 was the only machine to shake up the top spots, claiming No. 10. Based on an AMD EPYC processor with 48 cores and 2.45GHz working together with an NVIDIA A100 GPU and 80 GB of memory, Voyager-EUS2 also utilizes a Mellanox HDR Infiniband for data transfer.

While there were no other changes to the positions of the systems in the Top10, Perlmutter at NERSC improved its performance to 70.9 Pflop/s. Housed at the Lawrence Berkeley National Laboratory, Perlmutter's increased performance couldn't move it from its previously held No. 5 spot.

AMD Details Instinct MI200 Series Compute Accelerator Lineup

AMD today announced the new AMD Instinct MI200 series accelerators, the first exascale-class GPU accelerators. AMD Instinct MI200 series accelerators includes the world's fastest high performance computing (HPC) and artificial intelligence (AI) accelerator,1 the AMD Instinct MI250X.

Built on AMD CDNA 2 architecture, AMD Instinct MI200 series accelerators deliver leading application performance for a broad set of HPC workloads. The AMD Instinct MI250X accelerator provides up to 4.9X better performance than competitive accelerators for double precision (FP64) HPC applications and surpasses 380 teraflops of peak theoretical half-precision (FP16) for AI workloads to enable disruptive approaches in further accelerating data-driven research.

SiPearl Partners With Intel to Deliver Exascale Supercomputer in Europe

SiPearl, the designer of the high computing power and low consumption microprocessor that will be the heart of European supercomputers, has entered into a partnership with Intel in order to offer a common offer dedicated to the first exascale supercomputers in Europe. This partnership will offer their European customers the possibility of combining Rhea, the high computing power and low consumption microprocessor developed by SiPearl, with Intel's Ponte Vecchio accelerator, thus creating a high performance computing node that will promote the deployment of the exascale supercomputing in Europe.

To enable this powerful combination, SiPearl plans to use and optimize for its Rhea microprocessor the open and unified programming interface, oneAPI, created by Intel. Using this single solution across the entire heterogeneous compute node, consisting of Rhea and Ponte Vecchio, will increase developer productivity and application performance.

AMD EPYC Processors Picked by Argonne National Laboratory to Prepare for Exascale Future

AMD announced that the U.S. Department of Energy's (DOE) Argonne National Laboratory (Argonne) has chosen AMD EPYC processors to power a new supercomputer, called Polaris, which will prepare researchers for the forthcoming exascale supercomputer at Argonne called Aurora. Polaris is built by Hewlett Packard Enterprise (HPE), will use 2nd Gen EPYC processors and then upgrade to 3rd Gen AMD EPYC processors, and will allow scientists and developers to test and optimize software codes and applications to tackle a range of AI, engineering, and scientific projects.

"AMD EPYC server processors continue to be the leading choice for modern HPC research, delivering the performance and capabilities needed to help solve the complex problems that pre-exascale and exascale computing will address," said Forrest Norrod, senior vice president and general manager, Datacenter and Embedded Solutions Business Group, AMD. "We are extremely proud to support Argonne National Laboratory and their critical research into areas including low carbon technologies, medical research, astronomy, solar power and more as we draw closer to the exascale era."
Return to Keyword Browsing
Nov 21st, 2024 08:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts