Intel & HPE Declare Aurora Supercomputer Blade Installation Complete

T0@st · Jun 22, 2023

What's New: The Aurora supercomputer at Argonne National Laboratory is now fully equipped with all 10,624 compute blades, boasting 63,744 Intel Data Center GPU Max Series and 21,248 Intel Xeon CPU Max Series processors. "Aurora is the first deployment of Intel's Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world. We're proud to be part of this historic system and excited for the groundbreaking AI, science and engineering Aurora will enable."—Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group

What Aurora Is: A collaboration of Intel, Hewlett Packard Enterprise (HPE) and the Department of Energy (DOE), the Aurora supercomputer is designed to unlock the potential of the three pillars of high performance computing (HPC): simulations, data analytics and artificial intelligence (AI) on an extremely large scale. The system incorporates more than 1,024 storage nodes (using DAOS, Intel's distributed asynchronous object storage), providing 220 terabytes (TB) of capacity at 31TBs of total bandwidth, and leverages the HPE Slingshot high-performance fabric. Later this year, Aurora is expected to be the world's first supercomputer to achieve a theoretical peak performance of more than 2 exaflops (an exaflop is 1018 or a billion billion operations per second) when it enters the TOP 500 list.

Aurora will harness the full power of the Intel Max Series GPU and CPU product family. Designed to meet the demands of dynamic and emerging HPC and AI workloads, early results with the Max Series GPUs demonstrate leading performance on real-world science and engineering workloads, showcasing up to 2 times the performance of AMD MI250X GPUs on OpenMC, and near linear scaling up to hundreds of nodes. The Intel Xeon Max Series CPU drives a 40% performance advantage over the competition in many real-world HPC workloads, such as earth systems modeling, energy and manufacturing.

Why It Matters: From tackling climate change to finding cures for deadly diseases, researchers face monumental challenges that demand advanced computing technologies at scale. Aurora is poised to address the needs of the HPC and AI communities, providing the necessary tools to push the boundaries of scientific exploration. "While we work toward acceptance testing, we're going to be using Aurora to train some large-scale open source generative AI models for science," said Rick Stevens, Argonne National Laboratory associate laboratory director. "Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models."

How It Works: At the heart of this state-of-the-art system are Aurora's sleek rectangular blades, housing processors, memory, networking and cooling technologies. Each blade consists of two Intel Xeon Max Series CPUs and six Intel Max Series GPUs. The Xeon Max Series product family is already demonstrating great early performance on Sunspot (watch the video below), the test bed and development system with the same architecture as Aurora. Developers are utilizing oneAPI and AI tools to accelerate HPC and AI workloads and enhance code portability across multiple architectures.

The installation of these blades has been a delicate operation, with each 70-pound blade requiring specialized machinery to be vertically integrated into Aurora's refrigerator-sized racks. The system's 166 racks accommodate 64 blades each and span eight rows, occupying a space equivalent to two professional basketball courts in the Argonne Leadership Computing Facility (ALCF) data center.

Researchers from the ALCF's Aurora Early Science Program (ESP) and DOE's Exascale Computing Project will migrate their work from the Sunspot test bed to the fully installed Aurora. This transition will allow them to scale their applications on the full system. Early users will stress test the supercomputer and identify potential bugs that need to be resolved before deployment. This includes efforts to develop generative AI models for science, recently announced at the ISC'23 conference.

View at TechPowerUp Main Site | Source

Daven · Jun 22, 2023

Congratulations! 2 Exaflops! It just took ten years.

https://www.energy.gov/sites/default/files/2013/09/f2/20130913-SEAB-DOE-Exascale-Initiative.pdf

AMD and HPE did it in three.

https://www.hpe.com/us/en/newsroom/press-release/2020/03/hpe-and-amd-power-complex-scientific-discovery-in-worlds-fastest-supercomputer-for-us-department-of-energys-doe-national-nuclear-security-administration-nnsa.html

PerfectWave · Jun 22, 2023

hope it will not catch fire because of hte heat LUL!

AnarchoPrimitiv · Jun 22, 2023

I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...

TumbleGeorge · Jun 22, 2023

AnarchoPrimitiv said:
I'm willing to bet that Intel either sold the hardware at cost or even cheaper

Because here nobody knows true numbers of BOM.

phraide · Jun 22, 2023

providing 220 terabytes (TB) of capacity

not impressive

220PB or 220Tb per storage node maybe ?

Leiesoldat · Jun 22, 2023

AnarchoPrimitiv said:
I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...

This was a stipulation set by the Department of Energy that the multiple supercomputers could not all be from the same vendor. This is also just the delivery of the computer cabinets itself and not the actual acceptance testing.

Wirko · Jun 22, 2023

A bunch of neatly arranged boxes with neatly arranged piping ... that's fine, but it doesn't look all that impressive. Now show us the cooling system, Intel! With a few humans for scale.

Patriot · Jun 23, 2023

AnarchoPrimitiv said:
I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...

The last time they changed the spec Intel took a writeoff that quarter of 300M.
So yes, probably not making money on it.

https://www.reddit.com/r/AMD_Stock/comments/oq0odw Congratulations! 2 Exaflops! It just took ten years.

Daven said:
https://www.energy.gov/sites/default/files/2013/09/f2/20130913-SEAB-DOE-Exascale-Initiative.pdf

AMD and HPE did it in three.

https://www.hpe.com/us/en/newsroom/press-release/2020/03/hpe-and-amd-power-complex-scientific-discovery-in-worlds-fastest-supercomputer-for-us-department-of-energys-doe-national-nuclear-security-administration-nnsa.html

It technically hasn't been benchmarked yet.
And El-Capitan isn't finished being deployed yet.

Solaris17 · Jun 23, 2023

phraide said:
not impressive 220PB or 220Tb per storage node maybe ?

the compute side and storage side are different. The storage side will grow and expand as research requirements needs it the compute side (and it’s configuration) are the big spend

TumbleGeorge said:
Because here nobody knows true numbers of BOM.

For this? No. Probably not. There are plenty of real engineers on the forums though that deal with kind of thing everyday. You have to speak to your audience though. Higher compute or tech in general is easier to make a troll comment on than actually discuss. It’s hardly worth the effort since most users want higher Fortnite frame rates instead of actually learning.

phraide · Jun 24, 2023

Solaris17 said:
the compute side and storage side are different. The storage side will grow and expand as research requirements needs it the compute side (and it’s configuration) are the big spend

https://www.alcf.anl.gov/aurora : storage specs "230 PB, 31 TB/s, 1024 Nodes (DAOS)"
It could not be 220TB only as the article says (or the way I read and understand the article sentance).

Wirko · Jun 25, 2023

Solaris17 said:
the compute side and storage side are different. The storage side will grow and expand as research requirements needs it

That's hot, fast, write-intensive storage (according to some older presentation, it also contains some Optane). It's physically close to compute nodes, that's why it's decentralised into 1024 nodes. It's probably not destined to grow but can be complemented by colder, larger(?), less exciting and expandable storage, possibly spinning rust.

Solaris17 · Jun 25, 2023

Wirko said:
That's hot, fast, write-intensive storage (according to some older presentation, it also contains some Optane). It's physically close to compute nodes, that's why it's decentralised into 1024 nodes. It's probably not destined to grow but can be complemented by colder, larger(?), less exciting and expandable storage, possibly spinning rust.

Most of the time this is infiniband to nvme then bleeds off to a larger array of SSD cached spinning rust.

Wirko · Jun 25, 2023

phraide said:
https://www.alcf.anl.gov/aurora : storage specs "230 PB, 31 TB/s, 1024 Nodes (DAOS)"
It could not be 220TB only as the article says (or the way I read and understand the article sentance).

Well, someone at Intel didn't properly understand what they are selling. The 220 TB figure can be found at multiple web sites that didn't care to check Intel's press release, along with the "TBs" unit.

Also, total storage capacity divided by total speed amounts to two hours. If the capacity is fully used for input data and/or output data, the system spends at least two hours of precious supercomputer time transfering data to storage before processing, or from storage after processing, or both.

TumbleGeorge · Jun 25, 2023

phraide said:
https://www.alcf.anl.gov/aurora : storage specs "230 PB, 31 TB/s, 1024 Nodes (DAOS)"
It could not be 220TB only as the article says (or the way I read and understand the article sentance).

220TB per node.

System Name	The TPU Typewriter
Processor	AMD Ryzen 5 5600 (non-X)
Motherboard	GIGABYTE B550M DS3H Micro ATX
Cooling	DeepCool AS500
Memory	Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s)	PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage	Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s)	Lenovo Legion Y27q-20 27" QHD IPS monitor
Case	GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s)	FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply	ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse	Roccat Kone Pro Air
Keyboard	Cooler Master MasterKeys Pro L
Software	Windows 10 64-bit Home Edition

System Name	Gaming PC / I7 XEON
Processor	I7 4790K @stock / XEON W3680 @ stock
Motherboard	Asus Z97 MAXIMUS VII FORMULA / GIGABYTE X58 UD7
Cooling	X61 Kraken / X61 Kraken
Memory	32gb Vengeance 2133 Mhz / 24b Corsair XMS3 1600 Mhz
Video Card(s)	Gainward GLH 1080 / MSI Gaming X Radeon RX480 8 GB
Storage	Samsung EVO 850 500gb ,3 tb seagate, 2 samsung 1tb in raid 0 / Kingdian 240 gb, megaraid SAS 9341-8
Display(s)	2 BENQ 27" GL2706PQ / Dell UP2716D LCD Monitor 27 "
Case	Corsair Graphite Series 780T / Corsair Obsidian 750 D
Audio Device(s)	ON BOARD / ON BOARD
Power Supply	Sapphire Pure 950w / Corsair RMI 750w
Mouse	Steelseries Sesnsei / Steelseries Sensei raw
Keyboard	Razer BlackWidow Chroma / Razer BlackWidow Chroma
Software	Windows 1064bit PRO / Windows 1064bit PRO

System Name	Lightbringer
Processor	Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming
Cooling	Enermax Liqmax Iii 360mm AIO
Memory	G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s)	Sapphire RX 5700XT Nitro+
Storage	Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s)	LG 34BK95U-W 34" 5120 x 2160
Case	Lian Li PC-O11 Dynamic (White)
Power Supply	BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse	Glorious Model O (Matte White)
Keyboard	Royal Kludge RK71
Software	Windows 10

System Name	Arda
Processor	AMD Ryzen 5800X3D
Motherboard	Gigabyte X570-I AORUS Pro WiFi
Cooling	Custom Loop - Aquacomputer, Optimus, EK, Bykski
Memory	GSkill Trident Z RGB 32 GB (2x16) DDR4-3200
Video Card(s)	Gigabyte Gaming OC RX 6800XT
Storage	SK Hynix P41 1TB
Display(s)	VIOTEK 3440 x 1440 144 Hz Curved
Case	XTIA Proto-XL
Audio Device(s)	Schiit Modius + Schiit Jotunheim
Power Supply	Seasonic Prime 850W Titanium
Mouse	Xtrfy MZ1 Zy's Rail Wireless
Keyboard	Rainkeebs Yasui - Custom 40% Ortholinear
Software	Windows 11 Pro

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Intel & HPE Declare Aurora Supercomputer Blade Installation Complete

T0@st

News Editor

Daven

PerfectWave

AnarchoPrimitiv

TumbleGeorge

phraide

New Member

Leiesoldat

lazy gamer & woodworker

Wirko

Patriot

Solaris17

Super Dainty Moderator

phraide

New Member

Wirko

Solaris17

Super Dainty Moderator

Wirko

TumbleGeorge

System Name	[H]arbringer
Processor	4x 61XX ES @3.5Ghz (48cores)
Motherboard	SM GL
Cooling	3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory	16x gskill DDR3 1600 cas6 2gb
Video Card(s)	blah bigadv folder no gfx needed
Storage	32GB Sammy SSD
Display(s)	headless
Case	Xigmatek Elysium (whats left of it)
Audio Device(s)	yawn
Power Supply	Antec 1200w HCP
Software	Ubuntu 10.10
Benchmark Scores	http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid X 4090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.