Monday, June 13th 2022
Researchers Use SiFive's RISC-V SoC to Build a Supercomputer
Researchers from Università di Bologna and CINECA, the largest supercomputing center in Italy, have been playing with the concept of developing a RISC-V supercomputer. The team has laid the grounds for the first-ever implementation that demonstrates the capability of the relatively novel ISA to run high-performance computing. To create a supercomputer, you need pieces of hardware that seem like Lego building blocks. Those are called clusters, made from a motherboard, processor, memory, and storage. Italian researchers decided to try and use something different than Intel/AMD solution to the problem and use a processor based on RISC-V ISA. Using SiFive's Freedom U740 SoC as the base, researchers named their RISC-V cluster "Monte Cimone."
Monte Cimone features four dual-board servers, each in a 1U form factor. Each board has a SiFive's Freedom U740 SoC with four U74 cores running up to 1.4 GHz and one S7 management core. In total, eight nodes combine for a total of 32 RISC-V cores. Paired with 16 GB of 64-bit DDR4 memory operating at 1866s MT/s, PCIe Gen 3 x8 bus running at 7.8 GB/s, one gigabit Ethernet port, USB 3.2 Gen 1 interfaces, the system is powered by two 250 Watt PSUs to support future expansion and addition of accelerator cards.The team over in Italy benchmarked the system using HPL and Stream to determine the machine's floating-point computation capability and memory bandwidth. While the results are not very impressive, they are a beginning for RISC-V. Each node produced a sustained 1.86 GFLOPS performance in HPL, with a total computing power of 14.88 GFLOPS with perfect linear scaling. However, the efficiency for the entire cluster was 85%, resulting in 12.65 GFLOPS of computational force. The node should achieve a 14.928 GB/s in memory bandwidth; however, the actual results were 7760 MB/s.These results show two things. Firstly, the RISC-V HPC software stack is mature but needs further optimization and faster silicon to achieve anything monumental like weather simulation. Secondly, it shows that scaling in the HPC world is quite tricky and requires careful optimization to get the hardware and software to coexist in a world where everything scales well. So to get to a point where we meet some of the scaling and performance of supercomputers like Frontier, RISC-V needs a lot more tuning. Researchers and engineers are working hard to bring that idea to life, and it is a matter of time before we see more robust designs appear.
Sources:
The Next Platform, via Tom's Hardware
Monte Cimone features four dual-board servers, each in a 1U form factor. Each board has a SiFive's Freedom U740 SoC with four U74 cores running up to 1.4 GHz and one S7 management core. In total, eight nodes combine for a total of 32 RISC-V cores. Paired with 16 GB of 64-bit DDR4 memory operating at 1866s MT/s, PCIe Gen 3 x8 bus running at 7.8 GB/s, one gigabit Ethernet port, USB 3.2 Gen 1 interfaces, the system is powered by two 250 Watt PSUs to support future expansion and addition of accelerator cards.The team over in Italy benchmarked the system using HPL and Stream to determine the machine's floating-point computation capability and memory bandwidth. While the results are not very impressive, they are a beginning for RISC-V. Each node produced a sustained 1.86 GFLOPS performance in HPL, with a total computing power of 14.88 GFLOPS with perfect linear scaling. However, the efficiency for the entire cluster was 85%, resulting in 12.65 GFLOPS of computational force. The node should achieve a 14.928 GB/s in memory bandwidth; however, the actual results were 7760 MB/s.These results show two things. Firstly, the RISC-V HPC software stack is mature but needs further optimization and faster silicon to achieve anything monumental like weather simulation. Secondly, it shows that scaling in the HPC world is quite tricky and requires careful optimization to get the hardware and software to coexist in a world where everything scales well. So to get to a point where we meet some of the scaling and performance of supercomputers like Frontier, RISC-V needs a lot more tuning. Researchers and engineers are working hard to bring that idea to life, and it is a matter of time before we see more robust designs appear.
14 Comments on Researchers Use SiFive's RISC-V SoC to Build a Supercomputer
PS: Its a joke.
It actually is :D:D
Level 1999 computing
When you already have N2 with multi-socket systems, it becomes even harder to make anyone care about "Yet-another pointless RISC"!
RiscV is doomed to be a fully-customized micro-controller platform, NOT A SUPERCOMPUTER.
I realize that this is not particularly likely, and certainly not imminent, but the future may lie here. If so, these researchers will be at the forefront of this movement.
Nowadays almost everything hangs on FP, from ML/AI to physics simulations. The entire HPC industry basically throttles on parallelizing more FP16/FP32 for even more massive sims, most AI/ML code relies on FP, same goes for CV. That's where the entire stock of Unmatched boards went... I believe it's a bit stupid to build a "supercomputer" out of dev boards based on an early chip architecture with a core IP that developers themselves market as being "ideal" for network appliances and DVRs (not servers). Now, all these boards are gonna rot at some university's basement rather than being used by devs to port and adapt software for this platform. Once again, short-term financial gains beat long-term benefits .
Phoronix did an early review, and this thing is at best twice as slow as Pi400(in the best case scenario), hence a whole rack with two boards is barely enough to catch up with a credit-card sized SBC. They are an ideal candidate to bump MIPS off it's spot in network appliance market. Too bad SiFive decided to sell the bulk, just to tease "the next big thing" for devs, while it's been several generations of boards that missed the mark already. I don't think I've ever seen any SiFive boards in real life, nor was I able to buy a RISC-V MCU dev board. I was hoping to get my hands at least on Allwinner D1, but that's gonna be a real bummer at least until the war is over (most sellers on alibaba/aliexpress don't ship to Ukraine, at least stuff that's interesting or useful to me).
Back then there were no Facebook nor Google, integer performance can only do as much good as an email server.
Also the scientific workloads basically have an unbounded need for FP performance. The finer the simulation grid, the better. In the contrast Facebook and Google's server do have a finite client size and ROI. So, if someone want to push their scaling technique to the limit, they should build a computer for scientific workloads.
* FEA (Finite Element Analysis), aka simulated car crashes, bridge modeling, etc. etc.
* Weather simulations
* Protein Folding
* Atoms / Molecule simulations
* etc. etc.
All of these are double-precision floating point problems, the type that very big government organizations are willing to spend $300,000,000 to calculate slightly better than other government organizations. The supercomputer-level integer stuff is for CPU synthesis. Proving multipliers, RTL (register transfer languages) are correct and stuff.
I'm fairly certain they can, theoretically, be run on a GPU. (GPGPU accelerated binary decision diagrams are an active area of research right now. They seem possible even if not all the details are figured out) But I bet most programs (which are 30+ years old) are based on CPU compute (largely because the GPU stuff is still research-project phase).
It is said that the 3d-cache that AMD made is for AMD to use (!!!!), because it really accelerates this kind of simulation. So AMD is likely using AMD EPYCs with the best 3D-cache / big L3 caches for designing CPUs and other digital-synthesis problems.