Tuesday, November 19th 2024

TOP500: El Capitan Achieves Top Spot, Frontier and Aurora Follow Behind

The 64th edition of the TOP500 reveals that El Capitan has achieved the top spot and is officially the third system to reach exascale computing after Frontier and Aurora. Both systems have since moved down to No. 2 and No. 3 spots, respectively. Additionally, new systems have found their way onto the Top 10.

The new El Capitan system at the Lawrence Livermore National Laboratory in California, U.S.A., has debuted as the most powerful system on the list with an HPL score of 1.742 EFlop/s. It has 11,039,616 combined CPU and GPU cores and is based on AMD 4th generation EPYC processors with 24 cores at 1.8 GHz and AMD Instinct MI300A accelerators. El Capitan relies on a Cray Slingshot 11 network for data transfer and achieves an energy efficiency of 58.89 GigaFLOPS/watt. This power efficiency rating helped El Capitan achieve No. 18 on the GREEN500 list as well.
The Frontier system at Oak Ridge National Laboratory in Tennessee, U.S.A, has moved down to the No. 2 spot. It has increased its HPL score from 1.206 Eflop/s on the last list to 1.353 Eflop/s on this list. Frontier has also increased its total core count, from 8,699,904 cores on the last list to 9,066,176 cores on this list. It relies on Cray's Slingshot 11 network for data transfer.

The Aurora system at Argonne Leadership Computing Facility in Illinois, U.S.A, has claimed the No. 3 spot on this TOP500 list. The machine kept its HPL benchmark score from the last list, achieving 1.012 Exaflop/s. Aurora is built by Intel based on the HPE Cray EX - Intel Exascale Compute blade which uses Intel Xeon CPU Max Series Processors and Intel Data Center GPU Max Series accelerators that communicate through Cray's Slingshot-11 network interconnect.

The Eagle system installed on the Microsoft Azure Cloud in the U.S.A. claimed the No. 4 spot and remains the highest-ranked cloud-based system on the TOP500. It has an HPL score of 561.2 PFlop/s

The only other new system in the TOP 5 is the HPC6 system at No. 5. This machine is installed at Eni S.p.A center in Ferrera Erbognone, Italy and has the same architecture as the No. 2 system Frontier. The HPC6 system at Eni achieved an HPL benchmark of 477.90 PFlop/s and is now the fastest system in Europe.

Here is a summary of the system in the Top 10:
  • The El Capitan system at the Lawrence Livermore National Laboratory, California, USA is the new No. 1 system on the TOP500. The HPE Cray EX255a system was measured with 1.742 Exaflop/s on the HPL benchmark. El Capitan has 11,039,616 cores and is based on AMD 4th generation EPYC processors with 24 cores at 1.8 GHz and AMD Instinct MI300A accelerators. It uses the Cray Slingshot 11 network for data transfer and achieves an energy efficiency of 58.89 Gigaflops/watt.
  • Frontier is now the No. 2 system in the TOP500. This HPE Cray EX system was the first US system with a performance exceeding one Exaflop/s. It is installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, where it is operated for the Department of Energy (DOE). It currently has achieved 1.353 Exaflop/s using 8,699,904 cores. The HPE Cray EX architecture combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI, with AMD Instinct 250X accelerators, and a Slingshot-11 interconnect.
  • Aurora is currently the No. 3 with a preliminary HPL score of 1.012 Exaflop/s. It is installed at the Argonne Leadership Computing Facility, Illinois, USA, where it is also operated for the Department of Energy (DOE). This new Intel system is based on HPE Cray EX - Intel Exascale Compute Blades. It uses Intel Xeon CPU Max Series processors, Intel Data Center GPU Max Series accelerators, and a Slingshot-11 interconnect.
  • Eagle the No. 4 system is installed by Microsoft in its Azure cloud. This Microsoft NDv5 system is based on Xeon Platinum 8480C processors and NVIDIA H100 accelerators and achieved an HPL score of 561 Petaflop/s.
  • The new No. 5 system is called HPC6 and installed at Eni S.p.A center in Ferrera Erbognone in Italy. It is another HPE Cray EX235a system with 3rd Gen AMD EPYC CPUs optimized for HPC and AI, with AMD Instinct 250X accelerators, and a Slingshot-11 interconnect. It achieved 477.9 Petaflop/s.
  • Fugaku, the No. 6 system, is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Petaflop/s. It remains the fastest system on the HPCG benchmark with 16 TeraFLOPS.
  • After a recent upgrade the Alps system installed at the Swiss National Supercomputing Centre (CSCS) in Switzerland is now at No. 7. It is an HPE Cray EX254n system with NVIDIA Grace 72C and NVIDIA GH200 Superchip and a Slingshot-11 interconnect. After its upgrade it achieved 434.9 Petaflop/s.
  • The LUMI system, another HPE Cray EX system installed at EuroHPC center at CSC in Finland is at the No. 8 with a performance of 380 Petaflop/s. The European High-Performance Computing Joint Undertaking (EuroHPC JU) is pooling European resources to develop top-of-the-range Exascale supercomputers for processing big data. One of the pan-European pre-Exascale supercomputers, LUMI, is located in CSC's data center in Kajaani, Finland.
  • The No. 9 system Leonardo is installed at another EuroHPC site in CINECA, Italy. It is an Atos BullSequana XH2000 system with Xeon Platinum 8358 32C 2.6 GHz as main processors, NVIDIA A100 SXM4 40 GB as accelerators, and Quad-rail NVIDIA HDR100 Infiniband as interconnect. It achieved a HPL performance of 241.2 Petaflop/s.
  • Rounding out the TOP10 is the new Tuolumne system which is also installed at the Lawrence Livermore National Laboratory, California, USA. It is a sister system to the new No. 1 system El Capitan with identical architecture. It achieved 208.1 PetaFLOPs on its own.
Other TOP500 Highlights
The 64th edition of the TOP500 found AMD and Intel processors to be the preferred option for systems in the Top 10. Five systems use AMD processors (El Capitan, Frontier, HPC6, LUMI, and Tuolumne) while three systems use Intel (Aurora, Eagle, Leonardo). Alps relies on an NVIDIA processor while Fugaku has a proprietary Arm-based Fujitsu A65FX 48c 2.2 GHz.

Seven of the computers on the Top 10 use the Slingshot-11 interconnect (El Capitan, Frontier, Aurora, HPC6, Alps, LUMI, and Tuolumne) while two others use Infiniband (Eagle and Leonardo). Fugaku has its own proprietary Tofu interconnect.

While China and the United States were once again the countries that earned the most entries on the entire TOP500 list, it would appear that China is not participating to the extent that it once did. The United States added two systems to the list, bringing its total number of systems to 173. China once again dropped its number of representative machines on the list from 80 to 63 systems. Like last list, China did not introduce any new machines to the TOP500 list. Germany is quickly catching up to China, with 41 machines on the list.

In terms of continents, the upset on the previous list that saw Europe overtake Asia remains the same here. North America had 181 machines, Europe had 161 machines, and Asia had 143 machines on the list.

GREEN500 Results
This edition of the GREEN500 saw some big changes from new systems in the Top 3 list, outside of the No. 1 spot.

The No. 1 spot was once again claimed by JEDI - JUPITER Exascale Development Instrument, a system from EuroHPC/FZJ in Germany. Taking the No. 224 spot on the TOP500, JEDI was able to repeat its energy efficiency rating from the last list at 72.73 GFlops/Watt while producing an HPL score of 4.5 PFlop/s. JEDI is a BullSequana XH3000 machine with a Grace Hopper Superchip 72c 2GHz, an NVIDIA GH200 Superchip, a Quad-Rail NVIDIA InfiniBand NDR200, and has 19,584 total cores.

The No. 2 spot on this edition's GREEN500 was claimed by the new ROMEO-2025 system at the ROMEO HPC Center in Champagne-Ardenne, France. This system premiered with an energy efficiency rating of 70.91 GFlops/Watt and has an HPL benchmark of 9.863 PFlop/s. Although this is a new system, its architecture is identical to JEDI but is twice as large. Thus, its energy efficiency is slightly lower.

The No. 3 spot was claimed by the new Adastra 2 system at the Grand Equipement National de Calcul Intensif - Centre Informatique National de l'Enseignement Suprieur (GENCI-CINES) in France. Adastra 2's first appearance on this TOP500 list showed an energy efficiency score of 69.10 GFlops/Watt and an HPL score of 2.529 PFLop/s. This machine is a HPE Cray EX255a system with AMD 4th Gen EPYC 24 core 1.8GHz processors, AMD Instinct MI300A accelerators, it has 16,128 cores total, and a Slingshot-11 running RHEL.

The new El Capitan system and the Frontier system both deserve honorable mentions. Considering its top-scoring HPL benchmark of 1.742 EFlop/s, it is quite impressive that the machine was also able to snag the No. 18 spot on the GREEN500 with an energy efficiency score of 58.89 GigaFLOPS/watt. Frontier - the winner on the previous TOP500 list and No. 2 on this list- produced an impressive energy efficiency score of 54.98 GigaFLOPS/watt for this GREEN500 list. Both of these systems demonstrate that it is possible to achieve immense computational power while also prioritizing energy efficiency.

HPCG Results
The TOP500 list has incorporated the High-Performance Conjugate Gradient (HPCG) benchmark results, which provide an alternative metric for assessing supercomputer performance. This score is meant to complement the HPL measurement to give a fuller understanding of the machine.
  • Supercomputer Fugaku remains the leader on the HPCG benchmark with 16 PFlop/s. It held the top position since June 2020.
  • The DOE system Frontier at ORNL remains in the second position with 14.05 HPCG-Pflop/s.
  • The third position was again captured by the Aurora system with 5.6 HPCG-petaflops.
  • There are no HPCG submissions for El Capitan yet.
HPL-MxP Results (Formerly HPL-AI)
The HPL-MxP benchmark seeks to highlight the use of mixed precision computations. Traditional HPC uses 64-bit floating point computations. Today, we see hardware with various levels of floating-point precisions - 32-bit, 16-bit, and even 8-bit. The HPL-MxP benchmark demonstrates that by using mixed precision during computation, much higher performance is possible. By using mathematical techniques, the same accuracy can be computed with a mixed-precision technique when compared with straight 64-bit precision.

This year's winner of the HPL-MxP category is the Aurora system with 11.6 EFlop/s. The second spot goes to Frontier with a score of 11.4, and the No. 3 spot goes to LUMI with 2.35 EFlop/s.
Add your own comment

8 Comments on TOP500: El Capitan Achieves Top Spot, Frontier and Aurora Follow Behind

#1
Daven
The trajectory of AMD and Intel leading up to the current list is quite astounding.

June 1993 - first Top 500 list, #1 system 0.000006 PFLOP/s Thinking Machines Corporation SuperSparc
Intel 68 systems
AMD 0 systems

November 2006 - Height of K8 era, #1 system 0.28 PFLOP/s IBM PowerPC
Intel 263 systems
AMD 113 systems

June 2019 - Intel monopoly at its highest, #1 system 148 PFLOP/s IBM PowerPC
Intel 478 systems!!!
AMD 2 systems

November 2024 - GPUs everywhere, #1 system 1742 PFLOP/s AMD Zen/Instinct
Intel 309 systems
AMD 162 systems

By this time next year, Intel will have fallen below the 50% mark that it reached and exceeded in June 2004.
Posted on Reply
#2
igormp
Interesting to see how Nvidia's Grace offerings have managed to quickly go up the ladder and snatch many spots, specially in the GREEN500 one.
Posted on Reply
#3
ScaLibBDP
I've been tracking a topic of Exascale Supercomputers for a long time. Just checked my records and here are results:

US Department of Energy ( DoE ) Exascale Supercomputers:

El Capitan
Lawrence Livermore NL ( LLNL )
Expected Peak Processing Power 2.000 EFLOPs
Achieved Max Processing Power 1.742 EFLOPs ( ~12.9% lower )

Frontier
Oak Ridge NL ( ORNL )
Expected Peak Processing Power 1.500 EFLOPs
Achieved Max Processing Power 1.353 EFLOPs ( ~9.8% lower )

Aurora
Argonne NL ( ANL )
Expected Peak Processing Power 1.100 EFLOPs
Achieved Max Processing Power 1.012 EFLOPs ( ~8.0% lower )

All Achieved Peak Processing Powers below expected. All Expected Peak Processing Powers are from my initial records created more than 5 years ago.
Posted on Reply
#4
csendesmark
ScaLibBDPI've been tracking a topic of Exascale Supercomputers for a long time. Just checked my records and here are results:

US Department of Energy ( DoE ) Exascale Supercomputers:

El Capitan
Lawrence Livermore NL ( LLNL )
Expected Peak Processing Power 2.000 EFLOPs
Achieved Max Processing Power 1.742 EFLOPs ( ~12.9% lower )

Frontier
Oak Ridge NL ( ORNL )
Expected Peak Processing Power 1.500 EFLOPs
Achieved Max Processing Power 1.353 EFLOPs ( ~9.8% lower )

Aurora
Argonne NL ( ANL )
Expected Peak Processing Power 1.100 EFLOPs
Achieved Max Processing Power 1.012 EFLOPs ( ~8.0% lower )

All Achieved Peak Processing Powers below expected. All Expected Peak Processing Powers are from my initial records created more than 5 years ago.
Looks like the overhead is getting bigger and bigger, Is this because the scaling or for a different reason?
Posted on Reply
#5
ScaLibBDP
csendesmarkLooks like the overhead is getting bigger and bigger, Is this because the scaling or for a different reason?
>>...Is this because the scaling or for a different reason?

Yes, Scaling is the Biggest Problem. There are So Many processing elements in Exascale Supercomputers that it becomes much harder to manage all of them and keep Running.

A Mean Time Before Failure in case of Exascale Supercomputers is just several hours! It creates a lot of problems since during an execution of some HPC / Scientific software ( it could work for days! ) check points need to be created.

About 1.5 years ago talked to a Frontier software engineer and one of my questions was "What component is the most unreliable?" The answer was "...Power Supply Units...".
Posted on Reply
#6
csendesmark
ScaLibBDP>>...Is this because the scaling or for a different reason?

Yes, Scaling is the Biggest Problem. There are So Many processing elements in Exascale Supercomputers that it becomes much harder to manage all of them and keep Running.

A Mean Time Before Failure in case of Exascale Supercomputers is just several hours! It creates a lot of problems since during an execution of some HPC / Scientific software ( it could work for days! ) check points need to be created.

About 1.5 years ago talked to a Frontier software engineer and one of my questions was "What component is the most unreliable?" The answer was "...Power Supply Units...".
I see, why that is - many many individual components bringing down the MTBF, but why should be a big issue if one or two nodes falling?
It is doing parallel work with distribution.
Posted on Reply
#7
3valatzy
Are there new quantum computers secretly in operation, if so, why aren't they already included in the list?
Posted on Reply
#8
ScaLibBDP
3valatzyAre there new quantum computers secretly in operation, if so, why aren't they already included in the list?
There are several Quantum Computers ( QCs ) from D-Wave Systems, IBM, Rigetti, IonQ, Honeywell... Let me stop here.
Take a look at a generic overview at en.wikipedia.org/wiki/Quantum_computing if interested. "See also" section has a lot of additional weblinks for review.

>>...quantum computers secretly in operation...

Definitely Yes. You know that Intelligence Agencies, like NSA ( USA ), CSIS ( Canada ), SVR ( Russia ), etc, will Never disclose what exactly they are doing with QCs. As a matter of fact when Peter Shor released in 1994 a paper on how to factor a number using QC he was approached by a person from NSA. I'm absolutely confident that NSA has QCs from D-Wave Systems and IBM for R&D related to cryptography.

>>...why aren't they already included in the list?

Because all QCs are doing computations probabilistically and can Not execute Linpack and High Performance Conjugate Gradients ( HPCG ) benchmarks used to get performance numbers for www.top500.org.

In other words,

- Classic Computers do computations in deterministic way; They use bits with two possible states, 0 and 1. Example of deterministic computing, 2+2=4 and it is absolutely deterministic ( 100%! ).

- Quantum Computers do computations in probabilistic way; They use qubits with three possible states, 0, 1 and 0-and-1 at the same time ( also known as superposition ). QCs can Not do deterministic computing and a result of computations will be probabilistic; When qubits are in a superposition state a result of NumberA+NumberB will have some probability, for example 1+1=2 ( 75% ) and 0+0=0 ( 25% ).

Quantum Computers are very-very different when compared to Classical Computers.
csendesmarkI see, why that is - many many individual components bringing down the MTBF, but why should be a big issue if one or two nodes falling?
It is doing parallel work with distribution.
>>...why should be a big issue if one or two nodes falling?...

In HPC we always partition a data set.

For example, on a Quad-CPU system

25% of data processed by CPU #1,
25% data on CPU #2,
25% on CPU #3 and
25% on CPU #4.

When CPU #3 fails ( let's assume it ) computations need to be restarted from the check point available for CPU #3, or from the beginning ( the worst case ).
Posted on Reply
Dec 11th, 2024 20:31 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts