• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel & HPE Declare Aurora Supercomputer Blade Installation Complete

T0@st

News Editor
Joined
Mar 7, 2023
Messages
2,077 (3.32/day)
Location
South East, UK
What's New: The Aurora supercomputer at Argonne National Laboratory is now fully equipped with all 10,624 compute blades, boasting 63,744 Intel Data Center GPU Max Series and 21,248 Intel Xeon CPU Max Series processors. "Aurora is the first deployment of Intel's Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world. We're proud to be part of this historic system and excited for the groundbreaking AI, science and engineering Aurora will enable."—Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group

What Aurora Is: A collaboration of Intel, Hewlett Packard Enterprise (HPE) and the Department of Energy (DOE), the Aurora supercomputer is designed to unlock the potential of the three pillars of high performance computing (HPC): simulations, data analytics and artificial intelligence (AI) on an extremely large scale. The system incorporates more than 1,024 storage nodes (using DAOS, Intel's distributed asynchronous object storage), providing 220 terabytes (TB) of capacity at 31TBs of total bandwidth, and leverages the HPE Slingshot high-performance fabric. Later this year, Aurora is expected to be the world's first supercomputer to achieve a theoretical peak performance of more than 2 exaflops (an exaflop is 1018 or a billion billion operations per second) when it enters the TOP 500 list.




Aurora will harness the full power of the Intel Max Series GPU and CPU product family. Designed to meet the demands of dynamic and emerging HPC and AI workloads, early results with the Max Series GPUs demonstrate leading performance on real-world science and engineering workloads, showcasing up to 2 times the performance of AMD MI250X GPUs on OpenMC, and near linear scaling up to hundreds of nodes. The Intel Xeon Max Series CPU drives a 40% performance advantage over the competition in many real-world HPC workloads, such as earth systems modeling, energy and manufacturing.

Why It Matters: From tackling climate change to finding cures for deadly diseases, researchers face monumental challenges that demand advanced computing technologies at scale. Aurora is poised to address the needs of the HPC and AI communities, providing the necessary tools to push the boundaries of scientific exploration. "While we work toward acceptance testing, we're going to be using Aurora to train some large-scale open source generative AI models for science," said Rick Stevens, Argonne National Laboratory associate laboratory director. "Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models."

How It Works: At the heart of this state-of-the-art system are Aurora's sleek rectangular blades, housing processors, memory, networking and cooling technologies. Each blade consists of two Intel Xeon Max Series CPUs and six Intel Max Series GPUs. The Xeon Max Series product family is already demonstrating great early performance on Sunspot (watch the video below), the test bed and development system with the same architecture as Aurora. Developers are utilizing oneAPI and AI tools to accelerate HPC and AI workloads and enhance code portability across multiple architectures.


The installation of these blades has been a delicate operation, with each 70-pound blade requiring specialized machinery to be vertically integrated into Aurora's refrigerator-sized racks. The system's 166 racks accommodate 64 blades each and span eight rows, occupying a space equivalent to two professional basketball courts in the Argonne Leadership Computing Facility (ALCF) data center.

Researchers from the ALCF's Aurora Early Science Program (ESP) and DOE's Exascale Computing Project will migrate their work from the Sunspot test bed to the fully installed Aurora. This transition will allow them to scale their applications on the full system. Early users will stress test the supercomputer and identify potential bugs that need to be resolved before deployment. This includes efforts to develop generative AI models for science, recently announced at the ISC'23 conference.

View at TechPowerUp Main Site | Source
 
Joined
Jul 18, 2016
Messages
518 (0.17/day)
System Name Gaming PC / I7 XEON
Processor I7 4790K @stock / XEON W3680 @ stock
Motherboard Asus Z97 MAXIMUS VII FORMULA / GIGABYTE X58 UD7
Cooling X61 Kraken / X61 Kraken
Memory 32gb Vengeance 2133 Mhz / 24b Corsair XMS3 1600 Mhz
Video Card(s) Gainward GLH 1080 / MSI Gaming X Radeon RX480 8 GB
Storage Samsung EVO 850 500gb ,3 tb seagate, 2 samsung 1tb in raid 0 / Kingdian 240 gb, megaraid SAS 9341-8
Display(s) 2 BENQ 27" GL2706PQ / Dell UP2716D LCD Monitor 27 "
Case Corsair Graphite Series 780T / Corsair Obsidian 750 D
Audio Device(s) ON BOARD / ON BOARD
Power Supply Sapphire Pure 950w / Corsair RMI 750w
Mouse Steelseries Sesnsei / Steelseries Sensei raw
Keyboard Razer BlackWidow Chroma / Razer BlackWidow Chroma
Software Windows 1064bit PRO / Windows 1064bit PRO
hope it will not catch fire because of hte heat LUL!
 
Joined
Nov 6, 2016
Messages
1,751 (0.60/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...
 

Leiesoldat

lazy gamer & woodworker
Supporter
Joined
Jun 29, 2021
Messages
122 (0.10/day)
System Name Arda
Processor AMD Ryzen 5800X3D
Motherboard Gigabyte X570-I AORUS Pro WiFi
Cooling Custom Loop - Aquacomputer, Optimus, EK, Bykski
Memory GSkill Trident Z RGB 32 GB (2x16) DDR4-3200
Video Card(s) Gigabyte Gaming OC RX 6800XT
Storage SK Hynix P41 1TB
Display(s) VIOTEK 3440 x 1440 144 Hz Curved
Case XTIA Proto-XL
Audio Device(s) Schiit Modius + Schiit Jotunheim
Power Supply Seasonic Prime 850W Titanium
Mouse Xtrfy MZ1 Zy's Rail Wireless
Keyboard Rainkeebs Yasui - Custom 40% Ortholinear
Software Windows 11 Pro
I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...

This was a stipulation set by the Department of Energy that the multiple supercomputers could not all be from the same vendor. This is also just the delivery of the computer cabinets itself and not the actual acceptance testing.
 
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
A bunch of neatly arranged boxes with neatly arranged piping ... that's fine, but it doesn't look all that impressive. Now show us the cooling system, Intel! With a few humans for scale.
 
Joined
Oct 27, 2009
Messages
1,182 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
I'm willing to bet that Intel either sold the hardware at cost or even cheaper....can you think of ANY other reason why someone would go with an all Intel Supercomputer? I'm seriously asking...
The last time they changed the spec Intel took a writeoff that quarter of 300M.
So yes, probably not making money on it.

https://www.reddit.com/r/AMD_Stock/comments/oq0odw Congratulations! 2 Exaflops! It just took ten years.

It technically hasn't been benchmarked yet.
And El-Capitan isn't finished being deployed yet.
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,918 (3.83/day)
Location
Alabama
System Name RogueOne
Processor Xeon W9-3495x
Motherboard ASUS w790E Sage SE
Cooling SilverStone XE360-4677
Memory 128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 2TB WD SN850X | 2x 8TB GAMMIX S70
Display(s) 49" Philips Evnia OLED (49M2C8900)
Case Thermaltake Core P3 Pro Snow
Audio Device(s) Moondrop S8's on schitt Gunnr
Power Supply Seasonic Prime TX-1600
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD Quest 3
Software Windows 11 Pro Workstation
Benchmark Scores I dont have time for that.
not impressive :D 220PB or 220Tb per storage node maybe ?

the compute side and storage side are different. The storage side will grow and expand as research requirements needs it the compute side (and it’s configuration) are the big spend


Because here nobody knows true numbers of BOM.
For this? No. Probably not. There are plenty of real engineers on the forums though that deal with kind of thing everyday. You have to speak to your audience though. Higher compute or tech in general is easier to make a troll comment on than actually discuss. It’s hardly worth the effort since most users want higher Fortnite frame rates instead of actually learning.
 
Last edited:

phraide

New Member
Joined
Jun 22, 2023
Messages
2 (0.00/day)
the compute side and storage side are different. The storage side will grow and expand as research requirements needs it the compute side (and it’s configuration) are the big spend
https://www.alcf.anl.gov/aurora : storage specs "230 PB, 31 TB/s, 1024 Nodes (DAOS)"
It could not be 220TB only as the article says (or the way I read and understand the article sentance).
 
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
the compute side and storage side are different. The storage side will grow and expand as research requirements needs it
That's hot, fast, write-intensive storage (according to some older presentation, it also contains some Optane). It's physically close to compute nodes, that's why it's decentralised into 1024 nodes. It's probably not destined to grow but can be complemented by colder, larger(?), less exciting and expandable storage, possibly spinning rust.
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
26,918 (3.83/day)
Location
Alabama
System Name RogueOne
Processor Xeon W9-3495x
Motherboard ASUS w790E Sage SE
Cooling SilverStone XE360-4677
Memory 128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 2TB WD SN850X | 2x 8TB GAMMIX S70
Display(s) 49" Philips Evnia OLED (49M2C8900)
Case Thermaltake Core P3 Pro Snow
Audio Device(s) Moondrop S8's on schitt Gunnr
Power Supply Seasonic Prime TX-1600
Mouse Lamzu Atlantis mini (White)
Keyboard Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD Quest 3
Software Windows 11 Pro Workstation
Benchmark Scores I dont have time for that.
That's hot, fast, write-intensive storage (according to some older presentation, it also contains some Optane). It's physically close to compute nodes, that's why it's decentralised into 1024 nodes. It's probably not destined to grow but can be complemented by colder, larger(?), less exciting and expandable storage, possibly spinning rust.

Most of the time this is infiniband to nvme then bleeds off to a larger array of SSD cached spinning rust.
 
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
https://www.alcf.anl.gov/aurora : storage specs "230 PB, 31 TB/s, 1024 Nodes (DAOS)"
It could not be 220TB only as the article says (or the way I read and understand the article sentance).
Well, someone at Intel didn't properly understand what they are selling. The 220 TB figure can be found at multiple web sites that didn't care to check Intel's press release, along with the "TBs" unit.

Also, total storage capacity divided by total speed amounts to two hours. If the capacity is fully used for input data and/or output data, the system spends at least two hours of precious supercomputer time transfering data to storage before processing, or from storage after processing, or both.
 
Top