Tuesday, August 22nd 2023
Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance
The Exascale supercomputer arms race is making everyone invest their resources into trying to achieve the number one spot. Some countries, like China, actively participate in the race with little proof of their work, leaving the high-performance computing (HPC) community wondering about Chinese efforts on exascale systems. Today, we have some information regarding the next-generation Sunway system, which is supposed to be China's first exascale supercomputer. Replacing the Sunway TaihuLight, the next-generation Sunway will reportedly boast over 40 million cores in its system. The information comes from an upcoming presentation for Supercomputing 2023 show in Denver, happening from November 12 to November 17.
The presentation talks about 5 ExaFLOPS in the HPL-MxP benchmark with linear scalability on the 40-million-core Sunway supercomputer. The HPL-MxP benchmark is a mixed precision HPC benchmark made to test the system's capability in regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision. Supposedly, the next-generation Sunway system can output 5 ExaFLOPS with linear scaling on its 40-million-core system. What are those cores? We are not sure. The last-generation Sunway TaihuLight used SW26010 manycore 64-bit RISC processors based on the Sunway architecture, each with 260 cores. There were 40,960 SW26010 CPUs in the system for a total of 10,649,600 cores, which means that the next-generation Sunway system is more than four times more powerful from a core-count perspective. We expect some uArch and semiconductor node improvements as well.
Sources:
SC23 Presentation, Thanks to forum member TumbleGeorge for the tip!
The presentation talks about 5 ExaFLOPS in the HPL-MxP benchmark with linear scalability on the 40-million-core Sunway supercomputer. The HPL-MxP benchmark is a mixed precision HPC benchmark made to test the system's capability in regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision. Supposedly, the next-generation Sunway system can output 5 ExaFLOPS with linear scaling on its 40-million-core system. What are those cores? We are not sure. The last-generation Sunway TaihuLight used SW26010 manycore 64-bit RISC processors based on the Sunway architecture, each with 260 cores. There were 40,960 SW26010 CPUs in the system for a total of 10,649,600 cores, which means that the next-generation Sunway system is more than four times more powerful from a core-count perspective. We expect some uArch and semiconductor node improvements as well.
11 Comments on Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance
hpl-mxp.org/results.md
For instance the 1.2 Exaflop Frontier scores 10 mixed precision Exaflops on this benchmark.
So this supercomputer is 300-500Pflops measured traditionally.
They are 5 generations behind. They used 4.5x more cores than frontier to be half as powerful, and Frontier is a generation old now. I also imagine it uses 10x the power given their process limitations.
The accelerators going into El Capitan are 8x better at mixed precision.
Definitely China thanks to the unstable leadership of the US will be leaders.
In this case, 神威 in Chinese (shen wei) simply means God's might or divine might. 神 means god and 威 means might or authority. The use of 神威 makes a lot of sense here, considering you are trying to name and honor the first multi-billion dollar supercomputer made with Chinese processors rather than Intel or AMD ones, albeit a little over the top.
You are correct in pointing out this is also a word used in Japanese, but they are used differently. 神威 in Japanese came from an Ainu word, kamuy, which refers to the divine spirit that resets in animals or objects. Japanese people came up with writing of this Ainu word (Ainu people don't have any writing system). 神 (kami) also means god and 威 (i) also means might in Japanese. I guess these 2 words in Japanese together sounds just like Kamuy and is even similar in meaning so they used it.
>>...regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision...
Does it come from the presentation or, this is your modified text?
In any case, this is incorrect and needs to be corrected to:
>>...regular HPC workloads that require double-precision ( 53-bit ) and AI workloads that require single-precision ( 24-bit )...
This is because there are No 64-bit or 32-bit precisions.
In case of a double-precision 53 bits mantissa and 11 bits exponent, that is 64 bits in total.
In case of a single-precision 24 bits mantissa and 8 bits exponent, that is 32 bits in total.
Once again, precision of a floating point arithmetic is defined by how many bits are allocated for mantissa.
So, where is the TSMC 3nm process? How many years had intel stayed on their old 14nm process, famously known for the endless pluses... 14nm+++++
:D
If you want someone to blame, hpl-mxp.org/ the benchmark calls it 64bit accuracy. The source presentation doesn't really say much... sc23.supercomputing.org/presentation/?id=pap103&sess=sess160