Researchers Use SiFive's RISC-V SoC to Build a Supercomputer

AleksandarK · Jun 13, 2022

Researchers from Università di Bologna and CINECA, the largest supercomputing center in Italy, have been playing with the concept of developing a RISC-V supercomputer. The team has laid the grounds for the first-ever implementation that demonstrates the capability of the relatively novel ISA to run high-performance computing. To create a supercomputer, you need pieces of hardware that seem like Lego building blocks. Those are called clusters, made from a motherboard, processor, memory, and storage. Italian researchers decided to try and use something different than Intel/AMD solution to the problem and use a processor based on RISC-V ISA. Using SiFive's Freedom U740 SoC as the base, researchers named their RISC-V cluster "Monte Cimone."

Monte Cimone features four dual-board servers, each in a 1U form factor. Each board has a SiFive's Freedom U740 SoC with four U74 cores running up to 1.4 GHz and one S7 management core. In total, eight nodes combine for a total of 32 RISC-V cores. Paired with 16 GB of 64-bit DDR4 memory operating at 1866s MT/s, PCIe Gen 3 x8 bus running at 7.8 GB/s, one gigabit Ethernet port, USB 3.2 Gen 1 interfaces, the system is powered by two 250 Watt PSUs to support future expansion and addition of accelerator cards.

The team over in Italy benchmarked the system using HPL and Stream to determine the machine's floating-point computation capability and memory bandwidth. While the results are not very impressive, they are a beginning for RISC-V. Each node produced a sustained 1.86 GFLOPS performance in HPL, with a total computing power of 14.88 GFLOPS with perfect linear scaling. However, the efficiency for the entire cluster was 85%, resulting in 12.65 GFLOPS of computational force. The node should achieve a 14.928 GB/s in memory bandwidth; however, the actual results were 7760 MB/s.

These results show two things. Firstly, the RISC-V HPC software stack is mature but needs further optimization and faster silicon to achieve anything monumental like weather simulation. Secondly, it shows that scaling in the HPC world is quite tricky and requires careful optimization to get the hardware and software to coexist in a world where everything scales well. So to get to a point where we meet some of the scaling and performance of supercomputers like Frontier, RISC-V needs a lot more tuning. Researchers and engineers are working hard to bring that idea to life, and it is a matter of time before we see more robust designs appear.

View at TechPowerUp Main Site | Source

Denver · Jun 13, 2022

Risc-V doesn't even exist. period.

PS: Its a joke.

P4-630 · Jun 13, 2022

Then I call my own desktop system with a i7 12700K and 32GB DDR5 a "Supercomputer"...
It actually is

Avlin · Jun 13, 2022

Each node produced a sustained 1.86 GFLOPS

Level 1999 computing

defaultluser · Jun 13, 2022

You see, this is exactly why yu will never see any actual supercomputers form siifive - by the time you spend money building-up all that missing coherent interconnect, all that missing high-bandwith management engines, and then fixing the under-powered CPUs themselves, you could have just bought a compute GPU from NVIDIA.

When you already have N2 with multi-socket systems, it becomes even harder to make anyone care about "Yet-another pointless RISC"!

RiscV is doomed to be a fully-customized micro-controller platform, NOT A SUPERCOMPUTER.

eidairaman1 · Jun 13, 2022

P4-630 said:
Then I call my own desktop system with a i7 12700K and 32GB DDR5 a "Supercomputer"...
It actually is

Same with my FX8350 with 2400 ram

Fourstaff · Jun 14, 2022

Long journey ahead for RISC-V. We are not going to see high performance coming out of this architecture anytime soon, but they might work their way into cheaper electronics e.g. routers. or other "smart" devices.

Count von Schwalbe · Jun 14, 2022

Read a bit somewhere about x86 potentially reaching a ceiling of performance. While their predictions were inaccurate, I am not sure how much more we can shrink our nodes. If, and only if, there comes a point where new nodes are impractical (or far too expensive), we may see more RISC designs in pursuit of efficiency. As the only two real options are ARM and RISC-V, I can see this being a serious thing.

I realize that this is not particularly likely, and certainly not imminent, but the future may lie here. If so, these researchers will be at the forefront of this movement.

blitz120 · Jun 14, 2022

Why do supercomputer benchmarks always quote floating point performance? Most interesting software makes very little use of floating point, and integer and pointer arithmetic are far more common. I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

Wirko · Jun 14, 2022

blitz120 said:
Why do supercomputer benchmarks always quote floating point performance? Most interesting software makes very little use of floating point, and integer and pointer arithmetic are far more common. I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

Simulation of anything in the physical world requires FP. Machine learning requires FP. Sure there are some types of computing workload that require mostly integer arithmetic but I can't think of any right now. Gene sequencing maybe?

silentbogo · Jun 14, 2022

blitz120 said:
I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

I'm not a career programmer, dipped my toes on more than one occasion. If you work for a relatively big or relatively old company - you constantly have to deal with tons of legacy stuff, and legacy approaches to coding. Back in a day floats were slow and expensive, so most of the libs avoid using them. Some companies had their own portfolio of code, which might be outdated and total crap, but they still force everyone to use them. My cousin works at the big company which had a contract with anther big subcontractor, which worked for a huge and famous car manufacturer which I shall not name, which had a stupid requirement to use only their broken "proprietary" implementations of sdt* libraries, all to avoid GPL, or not being able to use the fastest and easiest option for device's UI, just because it's opensource.
Nowadays almost everything hangs on FP, from ML/AI to physics simulations. The entire HPC industry basically throttles on parallelizing more FP16/FP32 for even more massive sims, most AI/ML code relies on FP, same goes for CV.

AleksandarK said:
Researchers from Università di Bologna and CINECA, the largest supercomputing center in Italy, have been playing with the concept of developing a RISC-V supercomputer.

That's where the entire stock of Unmatched boards went... I believe it's a bit stupid to build a "supercomputer" out of dev boards based on an early chip architecture with a core IP that developers themselves market as being "ideal" for network appliances and DVRs (not servers). Now, all these boards are gonna rot at some university's basement rather than being used by devs to port and adapt software for this platform. Once again, short-term financial gains beat long-term benefits .
Phoronix did an early review, and this thing is at best twice as slow as Pi400(in the best case scenario), hence a whole rack with two boards is barely enough to catch up with a credit-card sized SBC.

Fourstaff said:
Long journey ahead for RISC-V. We are not going to see high performance coming out of this architecture anytime soon, but they might work their way into cheaper electronics e.g. routers. or other "smart" devices.

They are an ideal candidate to bump MIPS off it's spot in network appliance market. Too bad SiFive decided to sell the bulk, just to tease "the next big thing" for devs, while it's been several generations of boards that missed the mark already. I don't think I've ever seen any SiFive boards in real life, nor was I able to buy a RISC-V MCU dev board. I was hoping to get my hands at least on Allwinner D1, but that's gonna be a real bummer at least until the war is over (most sellers on alibaba/aliexpress don't ship to Ukraine, at least stuff that's interesting or useful to me).

First Strike · Jun 14, 2022

blitz120 said:
Why do supercomputer benchmarks always quote floating point performance? Most interesting software makes very little use of floating point, and integer and pointer arithmetic are far more common. I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

Because "supercomputer" is a term designated to scientific computing. Back when computers were first invented, they were used to calculate artillery trajectory, molecular dynamics, etc. That's what a "compute"r really means. So supercomputer computes super heavy scientific problems.

Back then there were no Facebook nor Google, integer performance can only do as much good as an email server.

Also the scientific workloads basically have an unbounded need for FP performance. The finer the simulation grid, the better. In the contrast Facebook and Google's server do have a finite client size and ROI. So, if someone want to push their scaling technique to the limit, they should build a computer for scientific workloads.

dragontamer5788 · Jun 14, 2022

blitz120 said:
Why do supercomputer benchmarks always quote floating point performance? Most interesting software makes very little use of floating point, and integer and pointer arithmetic are far more common. I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

Because supercomputer workloads are largely composed of double-precision floating point calculations.

* FEA (Finite Element Analysis), aka simulated car crashes, bridge modeling, etc. etc.
* Weather simulations
* Protein Folding
* Atoms / Molecule simulations
* etc. etc.

All of these are double-precision floating point problems, the type that very big government organizations are willing to spend $300,000,000 to calculate slightly better than other government organizations.

Wirko said:
Sure there are some types of computing workload that require mostly integer arithmetic but I can't think of any right now.

The supercomputer-level integer stuff is for CPU synthesis. Proving multipliers, RTL (register transfer languages) are correct and stuff.

I'm fairly certain they can, theoretically, be run on a GPU. (GPGPU accelerated binary decision diagrams are an active area of research right now. They seem possible even if not all the details are figured out) But I bet most programs (which are 30+ years old) are based on CPU compute (largely because the GPU stuff is still research-project phase).

It is said that the 3d-cache that AMD made is for AMD to use (!!!!), because it really accelerates this kind of simulation. So AMD is likely using AMD EPYCs with the best 3D-cache / big L3 caches for designing CPUs and other digital-synthesis problems.

eidairaman1 · Jun 15, 2022

blitz120 said:
Why do supercomputer benchmarks always quote floating point performance? Most interesting software makes very little use of floating point, and integer and pointer arithmetic are far more common. I spent almost 40 years as a software engineer, and used floating point perhaps a dozen times, and my colleagues had similar experiences.

silentbogo said:
I'm not a career programmer, dipped my toes on more than one occasion. If you work for a relatively big or relatively old company - you constantly have to deal with tons of legacy stuff, and legacy approaches to coding. Back in a day floats were slow and expensive, so most of the libs avoid using them. Some companies had their own portfolio of code, which might be outdated and total crap, but they still force everyone to use them. My cousin works at the big company which had a contract with anther big subcontractor, which worked for a huge and famous car manufacturer which I shall not name, which had a stupid requirement to use only their broken "proprietary" implementations of sdt* libraries, all to avoid GPL, or not being able to use the fastest and easiest option for device's UI, just because it's opensource.
Nowadays almost everything hangs on FP, from ML/AI to physics simulations. The entire HPC industry basically throttles on parallelizing more FP16/FP32 for even more massive sims, most AI/ML code relies on FP, same goes for CV.

That's where the entire stock of Unmatched boards went... I believe it's a bit stupid to build a "supercomputer" out of dev boards based on an early chip architecture with a core IP that developers themselves market as being "ideal" for network appliances and DVRs (not servers). Now, all these boards are gonna rot at some university's basement rather than being used by devs to port and adapt software for this platform. Once again, short-term financial gains beat long-term benefits .
Phoronix did an early review, and this thing is at best twice as slow as Pi400(in the best case scenario), hence a whole rack with two boards is barely enough to catch up with a credit-card sized SBC.

They are an ideal candidate to bump MIPS off it's spot in network appliance market. Too bad SiFive decided to sell the bulk, just to tease "the next big thing" for devs, while it's been several generations of boards that missed the mark already. I don't think I've ever seen any SiFive boards in real life, nor was I able to buy a RISC-V MCU dev board. I was hoping to get my hands at least on Allwinner D1, but that's gonna be a real bummer at least until the war is over (most sellers on alibaba/aliexpress don't ship to Ukraine, at least stuff that's interesting or useful to me).

At this rate we should go back to Fortran, Basic and use Unix.

blitz120 · Jun 16, 2022

Wirko said:
Simulation of anything in the physical world requires FP. Machine learning requires FP. Sure there are some types of computing workload that require mostly integer arithmetic but I can't think of any right now. Gene sequencing maybe?

Most arithmetic operations can be bounded, and thus scaled integers are more than sufficient, and avoid the rounding errors and nonuniform distribution of floating point numbers. Machine learning certainly uses floating point, but it certainly doesn't need it. On the other hand, there are many algorithms which require extensive graph manipulation, which rely heavily on pointer and integer arithmetic.

silentbogo said:
I'm not a career programmer, dipped my toes on more than one occasion. If you work for a relatively big or relatively old company - you constantly have to deal with tons of legacy stuff, and legacy approaches to coding. Back in a day floats were slow and expensive, so most of the libs avoid using them. Some companies had their own portfolio of code, which might be outdated and total crap, but they still force everyone to use them. My cousin works at the big company which had a contract with anther big subcontractor, which worked for a huge and famous car manufacturer which I shall not name, which had a stupid requirement to use only their broken "proprietary" implementations of sdt* libraries, all to avoid GPL, or not being able to use the fastest and easiest option for device's UI, just because it's opensource.
Nowadays almost everything hangs on FP, from ML/AI to physics simulations. The entire HPC industry basically throttles on parallelizing more FP16/FP32 for even more massive sims, most AI/ML code relies on FP, same goes for CV.

I spent most of my career working for large and old telecom companies, and generally didn't deal much with legacy systems. The work was on a variety of areas, from manufacturing to billing, to pattern recognition and matching, to implementing database and transaction processing systems, to OS, to language interpreters and compilers, to framework development. None of these required any significant amount of floating point work. The single largest use of floating point was building floating point types into a database system -- not because anyone really wanted to use it, but it was required to meet standards and kept us "check box compliant".

System Name	AlderLake
Processor	Intel i7 12700K P-Cores @ 5Ghz
Motherboard	Gigabyte Z690 Aorus Master
Cooling	Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory	32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s)	MSI RTX 2070 Super Gaming X Trio
Storage	Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s)	23.8" Dell S2417DG 165Hz G-Sync 1440p
Case	Be quiet! Silent Base 600 - Window
Audio Device(s)	Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply	Seasonic Focus Plus Gold 750W
Mouse	Logitech MX Anywhere 2 Laser wireless
Keyboard	RAPOO E9270P Black 5GHz wireless
Software	Windows 11
Benchmark Scores	Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock

System Name	PCGOD
Processor	AMD FX 8350@ 5.0GHz
Motherboard	Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling	Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory	16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s)	AMD Radeon 290 Sapphire Vapor-X
Storage	Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s)	NEC Multisync LCD 1700V (Display Port Adapter)
Case	AeroCool Xpredator Evil Blue Edition
Audio Device(s)	Creative Labs Sound Blaster ZxR
Power Supply	Seasonic 1250 XM2 Series (XP3)
Mouse	Roccat Kone XTD
Keyboard	Roccat Ryos MK Pro
Software	Windows 7 Pro 64

System Name	Orange! // ItchyHands
Processor	3570K // 10400F
Motherboard	ASRock z77 Extreme4 // TUF Gaming B460M-Plus
Cooling	Stock // Stock
Memory	2x4Gb 1600Mhz CL9 Corsair XMS3 // 2x8Gb 3200 Mhz XPG D41
Video Card(s)	Sapphire Nitro+ RX 570 // Asus TUF RTX 2070
Storage	Samsung 840 250Gb // SX8200 480GB
Display(s)	LG 22EA53VQ // Philips 275M QHD
Case	NZXT Phantom 410 Black/Orange // Tecware Forge M
Power Supply	Corsair CXM500w // CM MWE 600w

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	WS#1337
Processor	Ryzen 7 5700X3D
Motherboard	ASUS X570-PLUS TUF Gaming
Cooling	Xigmatek Scylla 240mm AIO
Memory	64GB DDR4-3600(4x16)
Video Card(s)	MSI RTX 3070 Gaming X Trio
Storage	ADATA Legend 2TB
Display(s)	Samsung Viewfinity Ultra S6 (34" UW)
Case	ghetto CM Cosmos RC-1000
Audio Device(s)	ALC1220
Power Supply	SeaSonic SSR-550FX (80+ GOLD)
Mouse	Logitech G603
Keyboard	Modecom Volcano Blade (Kailh choc LP)
VR HMD	Google dreamview headset(aka fancy cardboard)
Software	Windows 11, Ubuntu 24.04 LTS

Researchers Use SiFive's RISC-V SoC to Build a Supercomputer

AleksandarK

News Editor

Denver

P4-630

Avlin

defaultluser

eidairaman1

The Exiled Airman

Fourstaff

Moderator

Count von Schwalbe

Nocturnus Moderatus

blitz120

New Member

Wirko

silentbogo

Moderator

First Strike

dragontamer5788

eidairaman1

The Exiled Airman

blitz120

New Member