Friday, February 16th 2024

NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

Press Release by

Feb 16th, 2024 14:34 Discuss (20 Comments)

Providing a peek at the architecture powering advanced AI factories, NVIDIA released a video that offers the first public look at Eos, its latest data-center-scale supercomputer. An extremely large-scale NVIDIA DGX SuperPOD, Eos is where NVIDIA developers create their AI breakthroughs using accelerated computing infrastructure and fully optimized software. Eos is built with 576 NVIDIA DGX H100 systems, NVIDIA Quantum-2 InfiniBand networking and software, providing a total of 18.4 exaflops of FP8 AI performance. Revealed in November at the Supercomputing 2023 trade show, Eos—named for the Greek goddess said to open the gates of dawn each day—reflects NVIDIA's commitment to advancing AI technology.

Eos Supercomputer Fuels Innovation
Each DGX H100 system is equipped with eight NVIDIA H100 Tensor Core GPUs. Eos features a total of 4,608 H100 GPUs. As a result, Eos can handle the largest AI workloads to train large language models, recommender systems, quantum simulations and more. It's a showcase of what NVIDIA's technologies can do, when working at scale. Eos is arriving at the perfect time. People are changing the world with generative AI, from drug discovery to chatbots to autonomous machines and beyond. To achieve these breakthroughs, they need more than AI expertise and development skills. They need an AI factory—a purpose-built AI engine that's always available and can help ramp their capacity to build AI models at scale Eos delivers. Ranked No. 9 in the TOP 500 list of the world's fastest supercomputers, Eos pushes the boundaries of AI technology and infrastructure.

It includes NVIDIA's advanced accelerated computing and networking alongside sophisticated software offerings such as NVIDIA Base Command and NVIDIA AI Enterprise.

Eos's architecture is optimized for AI workloads demanding ultra-low-latency and high-throughput interconnectivity across a large cluster of accelerated computing nodes, making it an ideal solution for enterprises looking to scale their AI capabilities. Based on NVIDIA Quantum-2 InfiniBand with In-Network Computing technology, its network architecture supports data transfer speeds of up to 400 Gb/s, facilitating the rapid movement of large datasets essential for training complex AI models.

At the heart of Eos lies the groundbreaking DGX SuperPOD architecture powered by NVIDIA's DGX H100 systems. The architecture is built to provide the AI and computing fields with tightly integrated full-stack systems capable of computing at an enormous scale. As enterprises and developers worldwide seek to harness the power of AI, Eos stands as a pivotal resource, promising to accelerate the journey towards AI-infused applications that fuel every organization.

Sources: NVIDIA Blog, ServeTheHome

Add your own comment

20 Comments on NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

xrli

TOP 500 should make a separate ranking based on FP16 performance. This and other H100 super computers are clearly not trying to compete in FP64 HPC workload but is focusing only on AI.

GreiverBlade

awwww, no mention of the CPU used?

iirc DGX superPOD use AMD Rome Epyc, do they still?

interesting top 10 nonetheless

P4-630

GreiverBladeawwww, no mention of the CPU used?

Probably Nvidia Grace CPU's?...
www.nvidia.com/en-us/data-center/grace-cpu/

Patriot

P4-630Probably Nvidia Grace CPU's?...
www.nvidia.com/en-us/data-center/grace-cpu/

The nvidia grace superchip is 2 cpus or 1 cpu 1 gpu, not a dgx superpod.
The A100 DGX was AMD based, rumor is they wouldn't give them a discount this time around so they went Intel who would.

Daven

At this rate Aurora won’t even be in the top ten for more than one list.

AnarchoPrimitiv

PatriotThe nvidia grace superchip is 2 cpus or 1 cpu 1 gpu, not a dgx superpod.
The A100 DGX was AMD based, rumor is they wouldn't give them a discount this time around so they went Intel who would.

I have a feeling that a deep discount is the primary reason anyone chooses Intel in these applications

evanevanevan

GreiverBladeawwww, no mention of the CPU used?

iirc DGX superPOD use AMD Rome Epyc, do they still?

interesting top 10 nonetheless

Celeron, quad core

Patriot

AnarchoPrimitivI have a feeling that a deep discount is the primary reason anyone chooses Intel in these applications

Yes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.

Daven

PatriotYes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.

AMD sells lower core count Epyc SKUs. You don’t have to buy the 96 core version. So if what you are saying is true, the only motivation to buy Intel over AMD is still discounts.

#10

mechtech

serious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?

#11

evernessince

PatriotYes and no, not every application requires heavy cpu usage. Sometimes fewer higher clocked cores are better. From what I understand, even with the mi300x, Initial testing showed SP outperforming Genoa.
My guess is they need to try some mid cored Genoa with higher clocks, but it might be architectural and scheduler issues. And divide by 60 to get fp64 rating.

The article is about supercomputers which inherently means that workloads are designed for high parallelization. Absolutely Nvidia is not putting out the best product by going with the cheaper Intel CPUs, that will increase the TCO over time for it's customers.

mechtechserious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?

It varies a lot. $100 million to $1 Billion plus. Those numbers are prior AI boom mind you and given Nvidia's prices I would not be surprised if it exceeds that figure.

#12

mechtech

evernessinceThe article is about supercomputers which inherently means that workloads are designed for high parallelization. Absolutely Nvidia is not putting out the best product by going with the cheaper Intel CPUs, that will increase the TCO over time for it's customers.

It varies a lot. $100 million to $1 Billion plus. Those numbers are prior AI boom mind you and given Nvidia's prices I would not be surprised if it exceeds that figure.

Not dollars.......time....years?

#13

evernessince

mechtechNot dollars.......time....years?

Years usually.

#14

atomek

To put things into perspective, most powerful supercomputer from 2000 (Ascii White - 12 TFlops ) was ~ 1 400 000 times slower than this one.

#15

Count von Schwalbe

Nocturnus Moderatus

mechtechNot dollars.......time....years?

I think Frontier (ORNL) was around 4 years, IIRC.

#16

Wirko

mechtechNot dollars.......time....years?

Depends on how fast you're able to burn those dollars, hehe.

But if Nvidia decides to make Eos a sellable physical product, equal to this first Eos for the most part, then it shouldn't take more than a few months. Large companies might be interested, they would get a field-tested system with a predictable performance and a relatively short delivery time.

#17

evernessince

WirkoDepends on how fast you're able to burn those dollars, hehe.

But if Nvidia decides to make Eos a sellable physical product, equal to this first Eos for the most part, then it shouldn't take more than a few months. Large companies might be interested, they would get a field-tested system with a predictable performance and a relatively short delivery time.

Depends on how long planning takes and whether a suitable location needs to be built. Often the facility that houses a supercomputer is purpose built / furbished.

#18

Denver

4608 GPUs(H100) x 3.95Pflops = 18.2 Exaflops FP8
4000 GPUs(mi300x) x 5.2Pflops = 20.8 Exaflops FP8
4608 mi300x = 23,9Exaflops

:cool:

#19

krishx007

But can it play Crysis..?

#20

Leiesoldat

lazy gamer & woodworker

mechtechserious question

how long does it take to build a supercomputer - from tech spec to fully commissioned and operational?

The actual procurement, install, and initial optimization prior to the Top500 run of Frontier took around 5 to 6 years. The project started in FY2016 and the install occurred in December 2021.

The Top500 run was in May of 2022. The first design talks for an exascale supercomputer started at the beginning of the 2010's and the primary concern at the time was whether or not an exascale computer could be built and only consume 25 MW of electricity or less. This was a constraint imposed by the US Department of Energy due to the government not wanting to spend a buttload of money on energy costs. Cost for Frontier was around 500 to 600 million USD. The cost of the actual Exascale Computing Project (updating many large software and application products to use CPU/GPUs at these large scales) is 1.8 billion USD.

Source: Al Geist's (corporate fellow, ORNL) presentation talk at the Exascale Computing Project's 2023 Independent Project Review
Source2: I work in the project office for the ECP

Add your own comment

NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

20 Comments on NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

Related News

20 Comments on NVIDIA Unveils "Eos" to Public - a Top Ten Supercomputer

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts