Tuesday, August 22nd 2023

Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

Aug 22nd, 2023 02:27 Discuss (11 Comments)

The Exascale supercomputer arms race is making everyone invest their resources into trying to achieve the number one spot. Some countries, like China, actively participate in the race with little proof of their work, leaving the high-performance computing (HPC) community wondering about Chinese efforts on exascale systems. Today, we have some information regarding the next-generation Sunway system, which is supposed to be China's first exascale supercomputer. Replacing the Sunway TaihuLight, the next-generation Sunway will reportedly boast over 40 million cores in its system. The information comes from an upcoming presentation for Supercomputing 2023 show in Denver, happening from November 12 to November 17.

The presentation talks about 5 ExaFLOPS in the HPL-MxP benchmark with linear scalability on the 40-million-core Sunway supercomputer. The HPL-MxP benchmark is a mixed precision HPC benchmark made to test the system's capability in regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision. Supposedly, the next-generation Sunway system can output 5 ExaFLOPS with linear scaling on its 40-million-core system. What are those cores? We are not sure. The last-generation Sunway TaihuLight used SW26010 manycore 64-bit RISC processors based on the Sunway architecture, each with 260 cores. There were 40,960 SW26010 CPUs in the system for a total of 10,649,600 cores, which means that the next-generation Sunway system is more than four times more powerful from a core-count perspective. We expect some uArch and semiconductor node improvements as well.

Sources: SC23 Presentation, Thanks to forum member TumbleGeorge for the tip!

Add your own comment

11 Comments on Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

TumbleGeorge

There are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the western world.

Patriot

TumbleGeorgeThere are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the western world.

Apples and Oranges, they measured using their own metrics in the same way Nvidia made an "exascale" supercomputer in Europe.
hpl-mxp.org/results.md
For instance the 1.2 Exaflop Frontier scores 10 mixed precision Exaflops on this benchmark.

So this supercomputer is 300-500Pflops measured traditionally.

They are 5 generations behind. They used 4.5x more cores than frontier to be half as powerful, and Frontier is a generation old now. I also imagine it uses 10x the power given their process limitations.
The accelerators going into El Capitan are 8x better at mixed precision.

joemama

Kind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.

TumbleGeorgeThere are some articles that claim that China is already five generations behind in lithographic processes.

They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.

Vayra86

joemamaKind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.

They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.

Performance per core is lower meaning more interconnect/transport required too, this is far more impactful than just power and density. They simply need that much more of literally everything including cooling and space. And there is a point at which scaling further just isn't effective much anymore, so shrinks definitely do matter...

Unregistered

TumbleGeorgeThere are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the US allies.

Not sure they would be willing to share with Japan either, despite Japan being very Eastern.

Definitely China thanks to the unstable leadership of the US will be leaders.

Denver

As "good" as the rest of the Chinese claims

xrli

joemamaKind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.

They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.

Chinese and Japanese's writing system shares a lot of the same Hanzi or Kanji, China exported the writing system to Japan around 5~7th century.

In this case, 神威 in Chinese (shen wei) simply means God's might or divine might. 神 means god and 威 means might or authority. The use of 神威 makes a lot of sense here, considering you are trying to name and honor the first multi-billion dollar supercomputer made with Chinese processors rather than Intel or AMD ones, albeit a little over the top.

You are correct in pointing out this is also a word used in Japanese, but they are used differently. 神威 in Japanese came from an Ainu word, kamuy, which refers to the divine spirit that resets in animals or objects. Japanese people came up with writing of this Ainu word (Ainu people don't have any writing system). 神 (kami) also means god and 威 (i) also means might in Japanese. I guess these 2 words in Japanese together sounds just like Kamuy and is even similar in meaning so they used it.

ScaLibBDP

A message to AlexandarK:

>>...regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision...

Does it come from the presentation or, this is your modified text?

In any case, this is incorrect and needs to be corrected to:

>>...regular HPC workloads that require double-precision ( 53-bit ) and AI workloads that require single-precision ( 24-bit )...

This is because there are No 64-bit or 32-bit precisions.

In case of a double-precision 53 bits mantissa and 11 bits exponent, that is 64 bits in total.
In case of a single-precision 24 bits mantissa and 8 bits exponent, that is 32 bits in total.

Once again, precision of a floating point arithmetic is defined by how many bits are allocated for mantissa.

ARF

TumbleGeorgeThere are some articles that claim that China is already five generations behind in lithographic processes.

Except that TSMC is already in the end of those lithographic processes. If 3nm is not the last process ever, then the next one will certainly be. Because physics.
So, where is the TSMC 3nm process? How many years had intel stayed on their old 14nm process, famously known for the endless pluses... 14nm+++++

:D

#10

Patriot

ScaLibBDPA message to AlexandarK:

>>...regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision...

Does it come from the presentation or, this is your modified text?

In any case, this is incorrect and needs to be corrected to:

>>...regular HPC workloads that require double-precision ( 53-bit ) and AI workloads that require single-precision ( 24-bit )...

This is because there are No 64-bit or 32-bit precisions.

In case of a double-precision 53 bits mantissa and 11 bits exponent, that is 64 bits in total.
In case of a single-precision 24 bits mantissa and 8 bits exponent, that is 32 bits in total.

Once again, precision of a floating point arithmetic is defined by how many bits are allocated for mantissa.

It is referred to as FP64 and FP32 and FP16 or bfloat16 or double precision, single precision half precision... no one is going to split out the mantissa bits when talking about computational precision.
If you want someone to blame, hpl-mxp.org/ the benchmark calls it 64bit accuracy. The source presentation doesn't really say much... sc23.supercomputing.org/presentation/?id=pap103&sess=sess160

#11

leezhiran

xrliChinese and Japanese's writing system shares a lot of the same Hanzi or Kanji, China exported the writing system to Japan around 5~7th century.

In this case, 神威 in Chinese (shen wei) simply means God's might or divine might. 神 means god and 威 means might or authority. The use of 神威 makes a lot of sense here, considering you are trying to name and honor the first multi-billion dollar supercomputer made with Chinese processors rather than Intel or AMD ones, albeit a little over the top.

You are correct in pointing out this is also a word used in Japanese, but they are used differently. 神威 in Japanese came from an Ainu word, kamuy, which refers to the divine spirit that resets in animals or objects. Japanese people came up with writing of this Ainu word (Ainu people don't have any writing system). 神 (kami) also means god and 威 (i) also means might in Japanese. I guess these 2 words in Japanese together sounds just like Kamuy and is even similar in meaning so they used it.

That's precise.It means like super powerful in chinese.

Add your own comment

Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

11 Comments on Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

Related News

11 Comments on Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts