• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Chinese Exascale Sunway Supercomputer has Over 40 Million Cores, 5 ExaFLOPS Mixed-Precision Performance

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,573 (0.97/day)
The Exascale supercomputer arms race is making everyone invest their resources into trying to achieve the number one spot. Some countries, like China, actively participate in the race with little proof of their work, leaving the high-performance computing (HPC) community wondering about Chinese efforts on exascale systems. Today, we have some information regarding the next-generation Sunway system, which is supposed to be China's first exascale supercomputer. Replacing the Sunway TaihuLight, the next-generation Sunway will reportedly boast over 40 million cores in its system. The information comes from an upcoming presentation for Supercomputing 2023 show in Denver, happening from November 12 to November 17.

The presentation talks about 5 ExaFLOPS in the HPL-MxP benchmark with linear scalability on the 40-million-core Sunway supercomputer. The HPL-MxP benchmark is a mixed precision HPC benchmark made to test the system's capability in regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision. Supposedly, the next-generation Sunway system can output 5 ExaFLOPS with linear scaling on its 40-million-core system. What are those cores? We are not sure. The last-generation Sunway TaihuLight used SW26010 manycore 64-bit RISC processors based on the Sunway architecture, each with 260 cores. There were 40,960 SW26010 CPUs in the system for a total of 10,649,600 cores, which means that the next-generation Sunway system is more than four times more powerful from a core-count perspective. We expect some uArch and semiconductor node improvements as well.



View at TechPowerUp Main Site | Source
 
Joined
Sep 1, 2020
Messages
2,340 (1.52/day)
Location
Bulgaria
There are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the western world.
 
Joined
Oct 27, 2009
Messages
1,180 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
There are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the western world.

Apples and Oranges, they measured using their own metrics in the same way Nvidia made an "exascale" supercomputer in Europe.
For instance the 1.2 Exaflop Frontier scores 10 mixed precision Exaflops on this benchmark.

So this supercomputer is 300-500Pflops measured traditionally.

They are 5 generations behind. They used 4.5x more cores than frontier to be half as powerful, and Frontier is a generation old now. I also imagine it uses 10x the power given their process limitations.
The accelerators going into El Capitan are 8x better at mixed precision.
 
Last edited:
Joined
Nov 25, 2019
Messages
825 (0.45/day)
Location
Taiwan
Processor i5-9600K
Motherboard Gigabyte Z390 Gaming X
Cooling Scythe Mugen 5S
Memory Micron Ballistix Sports LT 3000 8G*4
Video Card(s) EVGA 3070 XC3 Ultra Gaming
Storage Adata SX6000 Pro 512G, Kingston A2000 1T
Display(s) Gigabyte M32Q
Case Antec DF700 Flux
Audio Device(s) Edifier C3X
Power Supply Super Flower Leadex Gold 650W
Mouse Razer Basilisk V2
Keyboard Ducky ONE 2 Horizon
Kind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.

There are some articles that claim that China is already five generations behind in lithographic processes.
They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.
 
Joined
Sep 17, 2014
Messages
22,422 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Kind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.


They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.
Performance per core is lower meaning more interconnect/transport required too, this is far more impactful than just power and density. They simply need that much more of literally everything including cooling and space. And there is a point at which scaling further just isn't effective much anymore, so shrinks definitely do matter...
 
D

Deleted member 185088

Guest
There are some articles that claim that China is already five generations behind in lithographic processes. Obviously, this, even if it is so, does not particularly prevent the country from developing strongly forward in the field of supercomputers. I'm sure China has several other exaflop supercomputers that it just hasn't advertised to the US allies.
Not sure they would be willing to share with Japan either, despite Japan being very Eastern.

Definitely China thanks to the unstable leadership of the US will be leaders.
 

xrli

New Member
Joined
Jun 22, 2023
Messages
20 (0.04/day)
Kind of wandering why they named it 神威, that's a pretty Japanese name and I'm sure the government wouldn't like that.


They don't need state of the art lithography technology to produce a processor, its just that the transistor density and power efficiency wouldn't be too good.
Chinese and Japanese's writing system shares a lot of the same Hanzi or Kanji, China exported the writing system to Japan around 5~7th century.

In this case, 神威 in Chinese (shen wei) simply means God's might or divine might. 神 means god and 威 means might or authority. The use of 神威 makes a lot of sense here, considering you are trying to name and honor the first multi-billion dollar supercomputer made with Chinese processors rather than Intel or AMD ones, albeit a little over the top.

You are correct in pointing out this is also a word used in Japanese, but they are used differently. 神威 in Japanese came from an Ainu word, kamuy, which refers to the divine spirit that resets in animals or objects. Japanese people came up with writing of this Ainu word (Ainu people don't have any writing system). 神 (kami) also means god and 威 (i) also means might in Japanese. I guess these 2 words in Japanese together sounds just like Kamuy and is even similar in meaning so they used it.
 
Joined
Jan 2, 2019
Messages
122 (0.06/day)
A message to AlexandarK:

>>...regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision...

Does it come from the presentation or, this is your modified text?

In any case, this is incorrect and needs to be corrected to:

>>...regular HPC workloads that require double-precision ( 53-bit ) and AI workloads that require single-precision ( 24-bit )...

This is because there are No 64-bit or 32-bit precisions.

In case of a double-precision 53 bits mantissa and 11 bits exponent, that is 64 bits in total.
In case of a single-precision 24 bits mantissa and 8 bits exponent, that is 32 bits in total.

Once again, precision of a floating point arithmetic is defined by how many bits are allocated for mantissa.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.66/day)
Location
Ex-usa | slava the trolls
There are some articles that claim that China is already five generations behind in lithographic processes.

Except that TSMC is already in the end of those lithographic processes. If 3nm is not the last process ever, then the next one will certainly be. Because physics.
So, where is the TSMC 3nm process? How many years had intel stayed on their old 14nm process, famously known for the endless pluses... 14nm+++++

:D
 
Joined
Oct 27, 2009
Messages
1,180 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
A message to AlexandarK:

>>...regular HPC workloads that require 64-bit precision and AI workloads that require 32-bit precision...

Does it come from the presentation or, this is your modified text?

In any case, this is incorrect and needs to be corrected to:

>>...regular HPC workloads that require double-precision ( 53-bit ) and AI workloads that require single-precision ( 24-bit )...

This is because there are No 64-bit or 32-bit precisions.

In case of a double-precision 53 bits mantissa and 11 bits exponent, that is 64 bits in total.
In case of a single-precision 24 bits mantissa and 8 bits exponent, that is 32 bits in total.

Once again, precision of a floating point arithmetic is defined by how many bits are allocated for mantissa.
It is referred to as FP64 and FP32 and FP16 or bfloat16 or double precision, single precision half precision... no one is going to split out the mantissa bits when talking about computational precision.
If you want someone to blame, https://hpl-mxp.org/ the benchmark calls it 64bit accuracy. The source presentation doesn't really say much... https://sc23.supercomputing.org/presentation/?id=pap103&sess=sess160
 
Joined
Jul 3, 2023
Messages
49 (0.10/day)
Processor 5800x3d
Motherboard msi b550m mortar
Cooling tr scenic 280v2
Memory 2666 16g*2
Video Card(s) 7900xtx
Chinese and Japanese's writing system shares a lot of the same Hanzi or Kanji, China exported the writing system to Japan around 5~7th century.

In this case, 神威 in Chinese (shen wei) simply means God's might or divine might. 神 means god and 威 means might or authority. The use of 神威 makes a lot of sense here, considering you are trying to name and honor the first multi-billion dollar supercomputer made with Chinese processors rather than Intel or AMD ones, albeit a little over the top.

You are correct in pointing out this is also a word used in Japanese, but they are used differently. 神威 in Japanese came from an Ainu word, kamuy, which refers to the divine spirit that resets in animals or objects. Japanese people came up with writing of this Ainu word (Ainu people don't have any writing system). 神 (kami) also means god and 威 (i) also means might in Japanese. I guess these 2 words in Japanese together sounds just like Kamuy and is even similar in meaning so they used it.
That's precise.It means like super powerful in chinese.
 
Top