• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

South Korean Company Morumi is Developing a CPU with Infinite Parallel Processing Scaling

TheLostSwede

News Editor
Joined
Nov 11, 2004
Messages
18,008 (2.44/day)
Location
Sweden
System Name Overlord Mk MLI
Processor AMD Ryzen 7 7800X3D
Motherboard Gigabyte X670E Aorus Master
Cooling Noctua NH-D15 SE with offsets
Memory 32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s) Gainward GeForce RTX 4080 Phantom GS
Storage 1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s) Acer XV272K LVbmiipruzx 4K@160Hz
Case Fractal Design Torrent Compact
Audio Device(s) Corsair Virtuoso SE
Power Supply be quiet! Pure Power 12 M 850 W
Mouse Logitech G502 Lightspeed
Keyboard Corsair K70 Max
Software Windows 10 Pro
Benchmark Scores https://valid.x86.fr/yfsd9w
One of the biggest drawbacks of modern CPUs is that adding more cores doesn't equal more performance in a linear fashion. Parallelism in CPUs offer limited scaling for most applications and even none for some. A South Korean company called Morumi is now taking a stab at solving this problem and wants to develop a CPU that can offer more or less infinite processing scaling, as more cores are added. The company has been around since 2018 and focused on various telecommunications chips, but has now started the development on what it calls every one period parallel processor (EOPPP) technology.

EOPPP is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, if the CPU has enough cores. Morumi already has an early 32-core prototype running on an FPGA and in certain tasks the company has seen a tenfold performance increase. It should be noted that this requires software specifically compiled for EOPPP and Moumi is set to release version 1.0 of its compiler later this year. It's still early days, but it'll be interesting to see how this technology develops, but if it's successfully developed, there's also a high chance of Morumi being acquired by someone much bigger that wants to integrate the technology into their own products.



View at TechPowerUp Main Site | Source
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
Around since 2018 ! And in 2022 they're aiming for infinite scaling!

Sighhh

I quote " is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, "

Hmnnn now is anyone else thinking wtaf is it me.

I thought that's how CPU work, distribute data, work on data , does an EORPPP use magical stuff wherein Intel use silicon.

What gives.

And have I mentioned that I am inventing a raycasting chip that can do infinite rays, it takes work in first then does work and through this simple change I WILL BEAT Nvidia, wait what.
 

TheLostSwede

News Editor
Joined
Nov 11, 2004
Messages
18,008 (2.44/day)
Location
Sweden
System Name Overlord Mk MLI
Processor AMD Ryzen 7 7800X3D
Motherboard Gigabyte X670E Aorus Master
Cooling Noctua NH-D15 SE with offsets
Memory 32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s) Gainward GeForce RTX 4080 Phantom GS
Storage 1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s) Acer XV272K LVbmiipruzx 4K@160Hz
Case Fractal Design Torrent Compact
Audio Device(s) Corsair Virtuoso SE
Power Supply be quiet! Pure Power 12 M 850 W
Mouse Logitech G502 Lightspeed
Keyboard Corsair K70 Max
Software Windows 10 Pro
Benchmark Scores https://valid.x86.fr/yfsd9w
Around since 2018 ! And in 2022 they're aiming for infinite scaling!

Sighhh

I quote " is said to distribute data to each of the cores in a CPU before the data is being processed, which is said to be done over a type of mesh network inside the CPU. This is said to allow for an almost unlimited amount of instructions to be handled at once, "

Hmnnn now is anyone else thinking wtaf is it me.

I thought that's how CPU work, distribute data, work on data , does an EORPPP use magical stuff wherein Intel use silicon.

What gives.

And have I mentioned that I am inventing a raycasting chip that can do infinite rays, it takes work in first then does work and through this simple change I WILL BEAT Nvidia, wait what.
I assume that the difference here is that the data is devided up in smaller chunks, so each processor core works on a chunk of data and the chunks are put back together at the end somewhere. To be honest, it's not entirely clear how it works and only so much info is available.

From the source link. Maybe I misunderstood something.
The pre-saved data are processed at once and the processed data are moved and saved in parallel on a mesh network. Using this saved result in the next period allows the sequential processing of this parallel data.
 
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
Basically computers are "stupid". Constantly and unnecessarily repeating the same calculations for different parts of the task. When it is easier to apply the result obtained from a single calculation everywhere the formula is the same. Instead of calculating a trillion times 2+1=3, in different queues, a single calculation is enough and the resulting value is embedded wherever it is needed.
 
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
I assume that the difference here is that the data is devided up in smaller chunks, so each processor core works on a chunk of data and the chunks are put back together at the end somewhere. To be honest, it's not entirely clear how it works and only so much info is available.

From the source link. Maybe I misunderstood something.
Oh right, sounds a bit mental even if vague, again that's what,!,, , two changes the work is split into chunks at start.
Worked on and,
Put back together at the end.

We have two versions of this in modern pcs already, this is exactly what a GPU does, unified processing across core's and has memory constraints since SRAM has stopped scaling and in general eats space, obviously a CPU does this on a limited small scale?!?.

But EORPPP needs to be specifically written for or compiled for and by the sound of it conceptually written For, ohh kk I mean Academia and enterprise might have a use but I think it limited, especially since we have massively parallel symptoms we already struggle to make work on general tasks and not enough tasks to warrant the financial input.

Well see , but Cerberus would also be saying yo What now.

Ps massively parallel systems :p made me laugh, it's staying, now where are those glasses. :):D
 
Joined
Aug 21, 2015
Messages
1,777 (0.52/day)
Location
North Dakota
System Name Office
Processor Ryzen 5600G
Motherboard ASUS B450M-A II
Cooling be quiet! Shadow Rock LP
Memory 16GB Patriot Viper Steel DDR4-3200
Video Card(s) Gigabyte RX 5600 XT
Storage PNY CS1030 250GB, Crucial MX500 2TB
Display(s) Dell S2719DGF
Case Fractal Define 7 Compact
Power Supply EVGA 550 G3
Mouse Logitech M705 Marthon
Keyboard Logitech G410
Software Windows 10 Pro 22H2
But EORPPP needs to be specifically written for or compiled for and by the sound of it conceptually written For, ohh kk I mean Academia and enterprise might have a use but I think it limited, especially since we have massively parallel symptoms we already struggle to make work on general tasks and not enough tasks to warrant the financial input.

A technology doesn't need to have consumer-oriented use cases to be interesting, IMO.
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
What you is missunderstand in this:
in certain tasks the company has seen a tenfold performance increase

Amdahl's law doesn't prevent 10x performance increase, or really, any arbitrary number increase, if sequential part is respectively small enough.
It's the "no performance limit" claim that is BS if there previously was one, as that would require sequential part to be nonexistent, i.e., program code being redesigned, and not just ran on another CPU.
 
Last edited:
Joined
Mar 10, 2010
Messages
11,880 (2.19/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s) Asus tuf RX7900XT /Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores laptop Timespy 6506
A technology doesn't need to have consumer-oriented use cases to be interesting, IMO.
I agree I am Intrigued but the vague in this one is StRONG, ,, and what is said is so ANDDD?!?!?.
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
So it is not a law, since any exceptions can exist, so it is a theorem with a limited range of conditions for which it is valid.

No. It's valid for ANY code and any processor, you just seem to misunderstanding, where it applies.

Let's say you have a piece of code.

1) You run it on CPU A, say, 128 core Xeon, but 127 of those disabled. You get some performance numbers.

2) Now you enable all cores, run it again. You get 10x speedup.

3) Now have same code ran on different CPU B with only 1 core, 127 disabled. You get some other performance number.

4) Rerun that code on CPU B with 128 cores too. What would speedup vs scenario 3) be? x10 too

What would difference between CPU A and CPU B when ran single-to-single or multi-to-multi? That has nothing to do with Amdahl's law, but with how A and B architectures are optimized for this kind of task.

So what Amdahl's law states is that speedup between scenarios 1 vs 2 and 3 vs 4 is the same, because you keep same architecture, but add more cores. This is actual scope of Amdahl's law.

Scenarios 1 vs 3 and 2 vs 4 are not the scope of Amdahl's law.

Changing an architecture is a different scenario. It can make CPU B 10/100/1000x faster than CPU A core-to-core, but it cannot change that speedup from adding more cores will plateau proportionally as well. That max speedup is inherent property of specific code and not something to work around in CPU architecture.

Only way to make it scale without limit is to rewrite the code so that there is no sequential part and all threads are ran independent from each other.

Then you get infinite scaling with more cores on CPU A, but also CPU B, and any other CPU that can run this code.

IOW, there is nothing magical about described CPU that would make same code have infinite scaling, if it didn't have it already.
 
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
No. It's valid for ANY code and any processor, you just seem to misunderstanding, where it applies.

Let's say you have a piece of code.

1) You run it on CPU A, say, 128 core Xeon, but 127 of those disabled. You get some performance numbers.

2) Now you enable all cores, run it again. You get 10x speedup.

3) Now have same code ran on different CPU B with only 1 core, 127 disabled. You get some other performance number.

4) Rerun that code on CPU B with 128 cores too. What would speedup vs scenario 3) be? x10 too

What would difference between CPU A and CPU B when ran single-to-single or multi-to-multi? That has nothing to do with Amdahl's law, but with how A and B architectures are optimized for this kind of task.

So what Amdahl's law states is that speedup between scenarios 1 vs 2 and 3 vs 4 is the same, because you keep same architecture, but add more cores. This is actual scope of Amdahl's law.

Scenarios 1 vs 3 and 2 vs 4 are not the scope of Amdahl's law.

Changing an architecture is a different scenario. It can make CPU B 10/100/1000x faster than CPU A core-to-core, but it cannot change that speedup from adding more cores will plateau proportionally as well. That max speedup is inherent property of specific code and not something to work around in CPU architecture.

Only way to make it scale without limit is to rewrite the code so that there is no sequential part and all threads are ran independent from each other.

Then you get infinite scaling with more cores on CPU A, but also CPU B, and any other CPU that can run this code.

IOW, there is nothing magical about described CPU that would make same code have infinite scaling, if it didn't have it already.
Mathematical logic is not always correct. At one time we described geocentrism mathematically correctly with the "correct" formulas, then we looked and saw that the Earth is not the center around which everything else revolves.
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
Then please pinpoint, what observation exactly does contradict Amdahl's law here.

10x speedup from more cores is not it, as it's predicted by that law to be entirely possible.

Making same code scale infinitely with more cores, when it didn't on other processors? That's not an observation. That's a claim. Not a validated one by anything provided. Until it actually gets validated, Amdahl's law stands. And I have temerity to strongly doubt it would get validated, ever. Somewhere in the range of doubting perpetuum mobile existing.

For the record "10x speedup" is a claim too for all we know now, but an easily believable one, since:

1)It does not violate said law
2)Processor designing companies have been optimizing architectures for specific tasks for decades
 
Last edited:
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
"Infinite" is used to attract attention and investment. It is more than obvious that this is a PR word order. I don't know why you're even trying to rub in that part.
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
That's justifying clickbait headlines that are greatly exaggerated or, in this case, outright false.

I guess you won't be complaining about clickbait in headlines for the sake of being consistent then.
 
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
That's justifying clickbait headlines that are greatly exaggerated or, in this case, outright false.

I guess you won't be complaining about clickbait in headlines for the sake of being consistent then.
I wouldn't deny a title correction, but if the article was produced to its OP here and accordingly he has the rights to change the title. If it is only a translation, and the article is owned by another author, hardly anything can be done about it without his consent.
 
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
It is too early to express such an opinion. What if they succeed? Will you turn your opinion 180°?
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
Succeed in what exactly?

Creating a highly performing/efficient architecture for specific tasks? I hope they do lol.

Overturning Amdahl's law by making programs suddenly perfectly scale with more cores when it didn't on other processors? I have a bridge to sell you, if you honestly believe that.
 
Joined
Sep 1, 2020
Messages
2,466 (1.54/day)
Location
Bulgaria
Oh no. And I'm on the principle of touching first to make sure of something. Which at such an early stage cannot possibly happen. But I'm not in a hurry to dismiss things as impossible. Whoever does it should try harder. Arguing by citing a limit theorem does not work.
 
Joined
Aug 12, 2020
Messages
1,207 (0.74/day)
And guess what? People have been more than just "touching" this for DECADES at this point and yet this law holds. See wiki article posted above to find out when this law was first presented.

Same as, say, evolution. Oh, it's "just" a theory, right? Yet we have so much evidence for it, that we basically accept it at this point. Unless one's tinfoil hat is slipping that is ;)
 
Joined
Dec 26, 2006
Messages
3,896 (0.59/day)
Location
Northern Ontario Canada
Processor Ryzen 5700x
Motherboard Gigabyte X570S Aero G R1.1 BiosF5g
Cooling Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s) AMD RX 6800 - Asus Tuf
Storage Kingston KC3000 1TB & 2TB & 4TB Corsair MP600 Pro LPX
Display(s) LG 27UL550-W (27" 4k)
Case Be Quiet Pure Base 600 (no window)
Audio Device(s) Realtek ALC1220-VB
Power Supply SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse Mionix Naos Pro
Keyboard Corsair Strafe with browns
Software W10 22H2 Pro x64
I'm guessing not on Windows OS ;)
 
Joined
Feb 1, 2019
Messages
3,712 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
How would this help on single threaded apps/games?

What you described just sounds like an enhanced hyperthreading?
 
Top