• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Scores Another EPYC Win in Exascale Computing With DOE's "El Capitan" Two-Exaflop Supercomputer

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.24/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD has been on a roll in both consumer, professional, and exascale computing environments, and it has just snagged itself another hugely important contract. The US Department of Energy (DOE) has just announced the winners for their next-gen, exascale supercomputer that aims to be the world's fastest. Dubbed "El Capitan", the new supercomputer will be powered by AMD's next-gen EPYC Genoa processors (Zen 4 architecture) and Radeon GPUs. This is the first such exascale contract where AMD is the sole purveyor of both CPUs and GPUs, with AMD's other design win with EPYC in the Cray Shasta being paired with NVIDIA graphics cards.

El Capitan will be a $600 million investment to be deployed in late 2022 and operational in 2023. Undoubtedly, next-gen proposals from AMD, Intel and NVIDIA were presented, with AMD winning the shootout in a big way. While initially the DOE projected El Capitan to provide some 1.5 exaflops of computing power, it has now revised their performance goals to a pure 2 exaflop machine. El Capitan willl thus be ten times faster than the current leader of the supercomputing world, Summit.





AMD's ability to provide an ecosystem with both CPUs and GPUs very likely played a key part in the DOE's choice for the project, and this all but guarantees that the contractor was left very satisfied with AMD's performance projections for both their Zen 4 and future GPU architectures. AMD's EPYC Genoa will feature support next-gen memory, implying DDR5 or later, and also feature unspecified next-gen I/O connections. AMD's graphics cards aren't detailed at all - they're just referred to as being part of the Radeon instinct lineup featuring a "new compute architecture".

Another wholly important part of this design win has to be that AMD has redesigned their 3rd Gen Infinity Fabric (which supports a 4:1 ratio of GPUs to CPUs) to provide data coherence between CPU and GPU - thus effectively reducing the need for data to move back and forth between the CPU and GPU as it is being processed. With relevant data being mirrored across both pieces of hardware through their coherent, Infinity Fabric-powered memory, computing efficiency can be significantly improved (since data transition usually requires more power expenditure than the actual computing calculations themselves), and that too must've played a key part in the selection.



El Capitan will also feature a future version of CRAY's proprietary Slingshot network fabric for increased speed and reduced latencies. All of this will be tied together with AMD's ROCm open software platform for heterogeneous programming to maximize performance of the CPUs and GPUs in OpenMP environments. ROCm has recently gotten a pretty healthy, $100 million shot in the arm also courtesy of the DOE, having deployed a Center of Excellence at the Lawrence Livermore National Lab (part of the DOE) to help develop ROCm. So this means AMD's software arm too is flexing its muscles - for this kind of deployment, at least - which has always been a contention point against rival NVIDIA, who has typically shown to invest much more in its software implementations than AMD - and hence the reason NVIDIA has been such a big player in the enterprise and computing segments until now.



As for why NVIDIA was shunned, it likely has nothing to do with their next-gen designs offering lesser performance than what AMD brought to the table. If anything, I'd take an educated guess in that the 3rd gen Infinity Fabric and its memory coherence was the deciding factor in choosing AMD GPUs over NVIDIA's, because the green company doesn't have anything like that to offer - it doesn't play in the x64 computing space, and can't offer that level of platform interconnectedness. Whatever the reason, this is yet another big win for AMD, who keeps muscling Intel out of very, very lucrative positions.



View at TechPowerUp Main Site
 
Joined
Dec 29, 2010
Messages
3,807 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
That is sick, 10x faster than Summit... crazy.
 
Joined
Mar 18, 2008
Messages
5,717 (0.94/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
What "future GPU" are we talking about?

I guess whatever DOE uses probably does not rely on CUDA or Tensorflow.
 
Joined
Dec 29, 2010
Messages
3,807 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
What "future GPU" are we talking about?

I guess whatever DOE uses probably does not rely on CUDA or Tensorflow.

If you read closely the ability reduce memory swapping between gpu and cpu memory was a primary benefit. The APIs don't matter since they will be tailored for them.
 
Joined
Dec 12, 2016
Messages
1,811 (0.62/day)
What "future GPU" are we talking about?

I guess whatever DOE uses probably does not rely on CUDA or Tensorflow.

CUDA is more for companies like mine where we have 10 people and make biomedical imaging devices. CUDA helps us speed up the image reconstruction on the GPU versus the CPU. We are too small to make our own APIs. Giant supercomputer projects have custom tailor made software.
 

silentbogo

Moderator
Staff member
Joined
Nov 20, 2013
Messages
5,540 (1.38/day)
Location
Kyiv, Ukraine
System Name WS#1337
Processor Ryzen 7 5700X3D
Motherboard ASUS X570-PLUS TUF Gaming
Cooling Xigmatek Scylla 240mm AIO
Memory 4x8GB Samsung DDR4 ECC UDIMM
Video Card(s) MSI RTX 3070 Gaming X Trio
Storage ADATA Legend 2TB + ADATA SX8200 Pro 1TB
Display(s) Samsung U24E590D (4K/UHD)
Case ghetto CM Cosmos RC-1000
Audio Device(s) ALC1220
Power Supply SeaSonic SSR-550FX (80+ GOLD)
Mouse Logitech G603
Keyboard Modecom Volcano Blade (Kailh choc LP)
VR HMD Google dreamview headset(aka fancy cardboard)
Software Windows 11, Ubuntu 24.04 LTS
What "future GPU" are we talking about?
If rumors are true, tomorrow AMD will have a Financial Analyst Day, and (taken with two shovels of salt) they should reveal some more info on RDNA2 and HPC.
 
Last edited by a moderator:

FreedomEclipse

~Technological Technocrat~
Joined
Apr 20, 2007
Messages
24,034 (3.74/day)
Location
London,UK
System Name DarnGosh Edition
Processor AMD 7800X3D
Motherboard MSI X670E GAMING PLUS
Cooling Thermalright AM5 Contact Frame + Phantom Spirit 120SE
Memory G.Skill Trident Z5 NEO DDR5 6000 CL32-38-38-96
Video Card(s) Asus Dual Radeon™ RX 6700 XT OC Edition
Storage WD SN770 1TB (Boot)| 2x 2TB WD SN770 (Gaming)| 2x 2TB Crucial BX500| 2x 3TB Toshiba DT01ACA300
Display(s) LG GP850-B
Case Corsair 760T (White) {1xCorsair ML120 Pro|5xML140 Pro}
Audio Device(s) Yamaha RX-V573|Speakers: JBL Control One|Auna 300-CN|Wharfedale Diamond SW150
Power Supply Seasonic Focus GX-850 80+ GOLD
Mouse Logitech G502 X
Keyboard Duckyshine Dead LED(s) III
Software Windows 11 Home
Benchmark Scores ლ(ಠ益ಠ)ლ
But can it multi-virtual machine run crysis?
 
Joined
Mar 18, 2008
Messages
5,717 (0.94/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
CUDA is more for companies like mine where we have 10 people and make biomedical imaging devices. CUDA helps us speed up the image reconstruction on the GPU versus the CPU. We are too small to make our own APIs. Giant supercomputer projects have custom tailor made software.

Not true. All major research universities as well as national labs have high rate of CUDA based GPU compute deployment. Training for ML/DL dataset for use in genomics absolutely relies on CUDA acceleration.

Unless one have massive amount of resource to invest in building something from ground up in OpenCL, CUDA is THE best option for accelerated computing.

TBH I prefer Vulkan compute over OpenCL. Better in every single way
 
Joined
Nov 15, 2005
Messages
1,011 (0.15/day)
Processor 2500K @ 4.5GHz 1.28V
Motherboard ASUS P8P67 Deluxe
Cooling Corsair A70
Memory 8GB (2x4GB) Corsair Vengeance 1600 9-9-9-24 1T
Video Card(s) eVGA GTX 470
Storage Crucial m4 128GB + Seagate RAID 1 (1TB x 2)
Display(s) Dell 22" 1680x1050 nothing special
Case Antec 300
Audio Device(s) Onboard
Power Supply PC Power & Cooling 750W
Software Windows 7 64bit Pro
Unless one have massive amount of resource to invest in building something from ground up in OpenCL, CUDA is THE best option for accelerated computing.

$600 million is fairly massive
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,995 (0.34/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA) / NVIDIA RTX 4090 Founder's Edition
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case HYTE Hakos Baelz Y60
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Wooting 60HE+ / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.4317
Not true. All major research universities as well as national labs have high rate of CUDA based GPU compute deployment. Training for ML/DL dataset for use in genomics absolutely relies on CUDA acceleration.

Can confirm. The major universities in Pittsburgh all have dedicated ML labs and mathematical science centers with mostly CUDA servers from Exxact and Dell's PowerEdge.
 
Joined
Dec 12, 2016
Messages
1,811 (0.62/day)
Can confirm. The major universities in Pittsburgh all have dedicated ML labs and mathematical science centers with mostly CUDA servers from Exxact and Dell's PowerEdge.

Universities don't work like corporations. They are made up of small research groups run by PIs (or professors). Usually the administration of an entire university doesn't decide on a university wide API. Since each of these small research groups are responsible for securing their own funding after an initial lab startup fund from the university which isn't much, they do not have the money to write their own APIs (unless that IS their research).
 
Joined
Aug 8, 2019
Messages
430 (0.22/day)
System Name R2V2 *In Progress
Processor Ryzen 7 2700
Motherboard Asrock X570 Taichi
Cooling W2A... water to air
Memory G.Skill Trident Z3466 B-die
Video Card(s) Radeon VII repaired and resurrected
Storage Adata and Samsung NVME
Display(s) Samsung LCD
Case Some ThermalTake
Audio Device(s) Asus Strix RAID DLX upgraded op amps
Power Supply Seasonic Prime something or other
Software Windows 10 Pro x64
I apologize for posting saucy.

Guess CUDA and a locked in eco-system isn't all that afterall.

Also likely means AMD is bringing AI processing to the next gen stuff.

Just because universities still use Apple's in their media programs doesn't mean they are the best. It's just what they've used and still do. Just saying...

Also shouldn't Vulkan's GP-GPU stuff be better than OpenCL since it replaced OpenGL basically. OpenCL is ancient by this point.
 
Last edited:

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,995 (0.34/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA) / NVIDIA RTX 4090 Founder's Edition
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case HYTE Hakos Baelz Y60
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Wooting 60HE+ / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.4317
Universities don't work like corporations. They are made up of small research groups run by PIs (or professors). Usually the administration of an entire university doesn't decide on a university wide API. Since each of these small research groups are responsible for securing their own funding after an initial lab startup fund from the university which isn't much, they do not have the money to write their own APIs (unless that IS their research).

You're right about that. Corporations create these supercomputers with a major goal in mind, so they would need custom APIs to get to that goal efficiently. But what @xkm1948 is getting at is that CUDA can scale from the basic enthusiast all the way to the [big] corporations that don't have the time (or need) to have a custom API developed for them.

If anything, those same corporations would employ researchers from these universities. :laugh:

I apologize for posting saucy.

Guess CUDA and a locked in eco-system isn't all that afterall.

Also likely means AMD is bringing AI processing to the next gen stuff.

Why do you keep saying CUDA is in a locked-in eco-system? You can run CUDA code on other hardware (even on x86 and ARM, if you're desperate) using HIP through ROCm, but you need to translate (not manual conversion) to avoid any NVIDIA extensions. This is currently a lot more efficient than what can be done in OpenCL 2.1.

The investment in ROCm is an advantage for everyone since all compute APIs will use this. Thank AMD for pulling this off.

Just because universities still use Apple's in their media programs doesn't mean they are the best. It's just what they've used and still do. Just saying...

Also shouldn't Vulkan's GP-GPU stuff be better than OpenCL since it replaced OpenGL basically. OpenCL is ancient by this point.

They still use Apple because of deals (think 60%+ hardware and support discounts) offered by Apple. Also hardware deployment of Mac minis and Pros depends on department use cases.

Vulkan is aimed at rendering (and why any GPGPU code using Vulkan is on the graphics pipeline), which is why it succeeds OpenGL. OpenCL is meant for GPGPU use.
 
Last edited:
Joined
Oct 27, 2009
Messages
1,180 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
Cray is a touch insane... to hit the 30Mw usage, and 1.5 exaflop and fit it in 100 cabinets they have 4 cpus/16 gpus per 1u of space in a cabinet...
averaging >5kw of power per u of compute space... 16,000 cpus and 64,000 gpus... :D
 
Joined
Jan 16, 2008
Messages
1,349 (0.22/day)
Location
Milwaukee, Wisconsin, USA
Processor i7-3770K
Motherboard Biostar Hi-Fi Z77
Cooling Swiftech H20 (w/Custom External Rad Enclosure)
Memory 16GB DDR3-2400Mhz
Video Card(s) Alienware GTX 1070
Storage 1TB Samsung 850 EVO
Display(s) 32" LG 1440p
Case Cooler Master 690 (w/Mods)
Audio Device(s) Creative X-Fi Titanium
Power Supply Corsair 750-TX
Mouse Logitech G5
Keyboard G. Skill Mechanical
Software Windows 10 (X64)
That's a played out name lacking in creativity.
 
Joined
Apr 8, 2010
Messages
1,008 (0.19/day)
Processor Intel Core i5 8400
Motherboard Gigabyte Z370N-Wifi
Cooling Silverstone AR05
Memory Micron Crucial 16GB DDR4-2400
Video Card(s) Gigabyte GTX1080 G1 Gaming 8G
Storage Micron Crucial MX300 275GB
Display(s) Dell U2415
Case Silverstone RVZ02B
Power Supply Silverstone SSR-SX550
Keyboard Ducky One Red Switch
Software Windows 10 Pro 1909
When spending all that money on customized hardware, they better developed customized API to make full use of them lol
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
What "future GPU" are we talking about?

I guess whatever DOE uses probably does not rely on CUDA or Tensorflow.
Fast Forward 1 used Nvidia, this is the Fast Forward 2 project that they are conjuring.

The Fast Forward 2 project is heterogeneous. One mistake, we thought it would be comprised of apus whereas epyc+RI with heterogeneous(coherent) memory is the underlying hardware.
 
Joined
Mar 18, 2008
Messages
5,717 (0.94/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Universities don't work like corporations. They are made up of small research groups run by PIs (or professors). Usually the administration of an entire university doesn't decide on a university wide API. Since each of these small research groups are responsible for securing their own funding after an initial lab startup fund from the university which isn't much, they do not have the money to write their own APIs (unless that IS their research).


In the ARM server news article I mentioned that my institution did get some ARM based nodes a few years back, at the request from Computer Science department and funded by state budget. The CS department was looking to bring x86 based softwares over to arm server side. They made some progress but ultimately that cost in time as well as the piss-poor performance of the arm cluster caused them to can the project.

Superior technology speaks louder than anything, which is also the reason all of our new nodes added are EPYC2 based.
 
Joined
May 15, 2014
Messages
235 (0.06/day)
Not true. All major research universities as well as national labs have high rate of CUDA based GPU compute deployment. Training for ML/DL dataset for use in genomics absolutely relies on CUDA acceleration.

Unless one have massive amount of resource to invest in building something from ground up in OpenCL, CUDA is THE best option for accelerated computing.

CUDA is a toolchain that makes it easier for the semi-skilled... ;) Nv just practices the Apple model of separable markets. Perennially cash strapped departments get reduced price HW with decent devrel support for academic/student research that can be generated quickly with reproducible results. Nv gets exposure via publications, students require CUDA once back ITRW, Nv gets to sell Quadros/Teslas to business/govt. Path of least resistance, virtuous circle, only game in town? Take your pick.

The investment in ROCm is an advantage for everyone since all compute APIs will use this. Thank AMD for pulling this off.
Let's see what resources are put into ROCm now that AMD has some income to fund dev. Nv has many years (decade) lead with their better fleshed out ecosystem. With nn/AI, Dnn/Dlops will feature heavily on upcoming IHV releases.
 
Joined
Jul 18, 2017
Messages
575 (0.21/day)
They picked the cheapest but good enuff and not the absolute highest performing options.
 
Joined
Oct 27, 2009
Messages
1,180 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
They picked the cheapest but good enuff and not the absolute highest performing options.
hardly, cray doesn't do cheap, and doesn't use reference designs, 100 cabinets is insane density and an Intel solution wouldn't have fit the power budget for performance requirement.
And nvidia doesn't like custom... they want you to use dgx reference boards. This is far denser than anything that could be achieved any other way.
While ROCm is still not at cuda parity its getting there... And their hip c++ compiler is pretty baller, you can take cuda code, compile it to C with AMDs tool and it run faster than native cuda on an nvidia card...

Also... While ROCm's support lags a version behind on things like tensorflow may be annoying and unacceptable elsewhere...it's faster than any government code is pushed out...
 
Last edited:
Top