Intel Advances Scientific Research and Performance for New Wave of Supercomputers

GFreeman · Nov 13, 2023

At SC23, Intel showcased AI-accelerated high performance computing (HPC) with leadership performance for HPC and AI workloads across Intel Data Center GPU Max Series, Intel Gaudi 2 AI accelerators and Intel Xeon processors. In partnership with Argonne National Laboratory, Intel shared progress on the Aurora generative AI (genAI) project, including an update on the 1 trillion parameter GPT-3 LLM on the Aurora supercomputer that is made possible by the unique architecture of the Max Series GPU and the system capabilities of the Aurora supercomputer. Intel and Argonne demonstrated the acceleration of science with applications from the Aurora Early Science Program (ESP) and the Exascale Computing Project. The company also showed the path to Intel Gaudi 3 AI accelerators and Falcon Shores.

"Intel has always been committed to delivering innovative technology solutions to meet the needs of the HPC and AI community. The great performance of our Xeon CPUs along with our Max GPUs and CPUs help propel research and science. That coupled with our Gaudi accelerators demonstrate our full breadth of technology to provide our customers with compelling choices to suit their diverse workloads," said Deepak Patil, Intel corporate vice president and general manager of Data Center AI Solutions.

Why It Matters
Generative AI for science along with the latest performance and benchmark results underscore Intel's ability to deliver tailored solutions to meet the specific needs of HPC and AI customers. Intel's software-defined approach with oneAPI and HPC and AI-enhanced toolkits, help developers seamlessly port their code across architectural frameworks to accelerate scientific research. Additionally, Max Series GPUs and CPUs will be deployed in multiple supercomputers that are coming online.

About Generative AI for Science
Argonne National Laboratory shared progress on its genAI for science initiatives with the Aurora supercomputer. The Aurora genAI project is a collaboration with Argonne, Intel and partners to create state-of-the-art foundational AI models for science. The models will be trained on scientific texts, code and science datasets at scales of more than 1 trillion parameters from diverse scientific domains. Using the foundational technologies of Megatron with DeepSpeed, the genAI project will service multiple scientific disciplines, including biology, cancer research, climate science, cosmology and materials science.

The distinctive Intel Max Series GPU architecture and the Aurora supercomputer system capabilities can efficiently handle 1 trillion-parameter models with just 64 nodes, far fewer than would be typically required. Argonne National Laboratory ran four instances on 256 nodes, demonstrating the ability to run multiple instances in parallel on Aurora, paving the path to scale the training of trillions of parameter models more quickly with trillions of tokens on more than 10,000 nodes.

About Intel and Argonne National Laboratory
Intel and Argonne National Laboratory demonstrated the acceleration of science at scale enabled by the system capabilities and software stack on Aurora. Workload examples include:

Brain connectome reconstruction is enabled at scale with Connectomics ML, showing competitive inference throughput on more than 500 Aurora nodes.
General Atomic and Molecular Electronic Structure System (GAMESS) showed over 2x competitive performance with Intel Max GPU compared to the Nvidia A100. This enables the modeling of complicated chemical processes in drug and catalyst design to unlock the secrets of molecular science with the Aurora supercomputer.
Hardware/Hybrid Accelerated Cosmology Code (HACC) has demonstrated runs on more than 1,500 Aurora nodes, enabling the visualization and understanding of the physics and evolution of the universe.
The drug-screening AI inference application, part of the Aurora Drug Discovery early science project (ESP), enables efficient screening of vast chemical datasets by enabling the screening of more than 20 billion of the most synthesized compounds on just 256 nodes.

Intel also showed new HPC and AI performance, as well as software optimizations across hardware and applications:

Intel and Dell published results for STAC-A2, an independent benchmark suite based on real-world market risk analysis workloads, showing great performance for the financial industry. Compared to eight Nvidia H100 PCIe GPUs, four Intel Data Center GPU Max 1550s had 26% higher warm Greeks 10-100k-1260 performance and 4.3x higher space efficiency.
The Intel Data Center GPU Max Series 1550 outperforms Nvidia H100 PCIe card by an average of 36% (1.36x) on diverse HPC workloads.
Intel Data Center GPU Max Series delivers improved support for AI models, including multiple large language models (LLMs) such as GPT-J and LLAMA2.
Intel Xeon CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor.
Last week, MLCommons published results of the industry standard MLPerf training v3.1 benchmark for training AI models. Intel Gaudi2 demonstrated a significant 2x performance leap with the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark.
- Intel will usher in Intel Gaudi3 AI accelerators in 2024. The Gaudi3 AI accelerator will be based on the same high-performance architecture as Gaudi2 and is expected to deliver 4x the compute (BF16), double the networking bandwidth for greater scale-out performance, and 1.5x the on-board HBM memory to readily handle the growing demand for high-performance, high-efficiency compute of LLMs without performance degradation.
5th Gen Intel Xeon processors will deliver up to 1.4x higher performance gen-over-gen on HPC applications as demonstrated by LAMMPS-Copper.
- Granite Rapids, a future Intel Xeon processor, will deliver increased core count and built-in acceleration with Intel Advanced Matrix Extensions and support for multiplexer combined ranks (MCR) DIMMs. Granite Rapids will have 2.9x better DeepMD+LAMMPS AI inference. MCR achieves speeds of 8,800 megatransfers per second based on DDR5 and greater than 1.5 terabytes per second of memory bandwidth capability in a two-socket system, which is critical for feeding the fast-growing core counts of modern CPUs and enabling efficiency and flexibility.

About New Progress on oneAPI
Intel announced features for its 2024 software development tools that advance open software development powered by oneAPI multiarchitecture programming. New tools help developers extend new AI and HPC capabilities on Intel CPUs and GPUs with broader coverage, including faster performance and deployments using standard Python for numeric workloads, and compiler enhancements delivering a near-complete SYCL 2020 implementation to improve productivity and code offload.

Additionally, Texas Advanced Computing Center (TACC) announced its oneAPI Center of Excellence will focus on projects that develop and optimize seismic imaging benchmark codes. Intel fosters an environment where software and hardware innovation and research advance the industry, with 32 oneAPI Centers of Excellence worldwide.

What's Next
Intel emphasized its commitment to AI and HPC and highlighted market momentum. New supercomputer deployments with Intel Max Series GPU and CPU technologies include systems like Aurora, Dawn Phase 1, SuperMUC-NG Phase 2, Clementina XX1 and more. New systems featuring Intel Gaudi2 accelerators include a large AI supercomputer with Stability AI as the anchor customer.

This momentum will be foundational for Falcon Shores, Intel's next-generation GPU for AI and HPC. Falcon Shores will leverage the Intel Gaudi and Intel Xe intellectual property (IP) with a single GPU programming interface built on oneAPI. Applications built on Intel Gaudi AI accelerators, as well as Intel Max Series GPUs today will be able to migrate with ease to Falcon Shores in the future.

View at TechPowerUp Main Site | Source

Daven · Nov 13, 2023

So is Aurora finally ready? I couldn’t tell for sure by reading the press release.

AnarchoPrimitiv · Nov 13, 2023

"Intel Xeon CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor."

I'd really like to know what Intel is specifically referring to here, because I recently looked at a review of those HBM processors by Phoronix and only in a handful of very specific use cases did they beat Genoa.

Worse yet for Intel, once Phoronix tested the Genoa-X high Cache variants, they wiped the floor with sapphire rapids, with HBM and without.

It doesn't get much more brutal than that....now I realize why Intel released these HBM variants a day before Genoa-X was released. When I see numbers like this, it makes me VERY curious as to why any entity would use Intel hardware for HPC or a supercomputer....the only thing I can think of is that Intel must be practically giving the chips away, because you're not going to choose Intel for efficiency or performance, and on paper AMD is cheaper, so I have to believe Intel is just making backdoor deals at rock-bottom prices.

Daven · Nov 13, 2023

AnarchoPrimitiv said:
"Intel Xeon CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor."

I'd really like to know what Intel is specifically referring to here, because I recently looked at a review of those HBM processors by Phoronix and only in a handful of very specific use cases did they beat Genoa.

Worse yet for Intel, once Phoronix tested the Genoa-X high Cache variants, they wiped the floor with sapphire rapids, with HBM and without.

View attachment 321419

It doesn't get much more brutal than that....now I realize why Intel released these HBM variants a day before Genoa-X was released. When I see numbers like this, it makes me VERY curious as to why any entity would use Intel hardware for HPC or a supercomputer....the only thing I can think of is that Intel must be practically giving the chips away, because you're not going to choose Intel for efficiency or performance, and on paper AMD is cheaper, so I have to believe Intel is just making backdoor deals at rock-bottom prices.

The market size might be bigger than what AMD can produce so Intel might be a second/backup source if you need something nowish. Of course ‘second best’ or ‘when the better choice isn’t available’ doesn’t sound great on a bumper sticker.

Intel has a new nickname, ‘half scale’

Frontier remains No. 1 in the TOP500 but Aurora with Intel’s Sapphire Rapids chips enters with a half-scale system at No. 2 | TOP500

www.top500.org

This is just sad.

Jism · Nov 14, 2023

AnarchoPrimitiv said:
"Intel Xeon CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor."

I'd really like to know what Intel is specifically referring to here, because I recently looked at a review of those HBM processors by Phoronix and only in a handful of very specific use cases did they beat Genoa.

Worse yet for Intel, once Phoronix tested the Genoa-X high Cache variants, they wiped the floor with sapphire rapids, with HBM and without.

View attachment 321419

It doesn't get much more brutal than that....now I realize why Intel released these HBM variants a day before Genoa-X was released. When I see numbers like this, it makes me VERY curious as to why any entity would use Intel hardware for HPC or a supercomputer....the only thing I can think of is that Intel must be practically giving the chips away, because you're not going to choose Intel for efficiency or performance, and on paper AMD is cheaper, so I have to believe Intel is just making backdoor deals at rock-bottom prices.

I was suprised asking for a AMD CPU a while back at a local store. They still live in the era thinking Intel was better. "Our customers only want Intel" ...

Even in the higher end Intel CPU's are terrible in regards of efficiency compared to Ryzen's. The same is with their datacenter lineup. They are clocked too slow (base 1.8GHz / Turbo 3.x).

System Name	Lightbringer
Processor	Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming
Cooling	Enermax Liqmax Iii 360mm AIO
Memory	G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s)	Sapphire RX 5700XT Nitro+
Storage	Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s)	LG 34BK95U-W 34" 5120 x 2160
Case	Lian Li PC-O11 Dynamic (White)
Power Supply	BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse	Glorious Model O (Matte White)
Keyboard	Royal Kludge RK71
Software	Windows 10

Intel Advances Scientific Research and Performance for New Wave of Supercomputers

GFreeman

News Editor

Daven

AnarchoPrimitiv

Daven

Frontier remains No. 1 in the TOP500 but Aurora with Intel’s Sapphire Rapids chips enters with a half-scale system at No. 2 | TOP500

Jism