Friday, July 14th 2017

Windows 10 Process-Termination Bug Slows Down Mighty 24-Core System to a Crawl

So, you work for Google. Awesome, right? Yeah. You know what else is awesome? Your 24-Core, 48-thread Intel build system with 64 GBs of ram and a nice SSD. Life is good man. So, you've done your code work for the day on Chrome, because that's what you do, remember? (Yeah, that's right, it's awesome). Before you go off to collect your google-check, you click "compile" and expect a speedy result from your wicked fast system.

Only you don't get it... Instead, your system comes grinding to a lurching halt, and mouse movement becomes difficult. Fighting against what appears to be an impending system crash, you hit your trusty "CTRL-ALT-DELETE" and bring up task manager... to find only 50% CPU/RAM utilization. Why then, was everything stopping?

If you would throw up your arms and walk out of the office, this is why you don't work for Google. For Google programmer Bruce Dawson, there was only one logical way to handle this: "So I did what I always do - I grabbed an ETW trace and analyzed it. The result was the discovery of a serious process-destruction performance bug in Windows 10."
This is an excerpt from a long, detailed blog post by Bruce titled "24-core CPU and I can't move my mouse" on his Wordpress blog randomascii. In it, he details a serious new bug that is only present in Windows 10 (not other versions). Process destruction appears to be serialized.

What does that mean, exactly? It means when a process "dies" or closes, it must go through a single thread to handle this. In this critical part of the OS which every process must eventually partake in, Windows 10 is actually single threaded.

To be fair, this is not a normal issue an end user would encounter. But developers often spawn lots of processes and close them just as often. They use high-end multi-core CPUs to speed this along. Bruce notes that in his case, his 24-core CPU only made things worse, as it actually caused the build process to spawn more build processes, and thus, even more had to close. And because they all go through the same single threaded queue, the OS grinds to a halt during this operation, and performance peak is never realized.

As for whether this is a big bug if you aren't a developer: Well that's up for debate. Certainly not directly, I'd wager, but as a former user of OS/2 and witness to Microsoft's campaign against it back in the day, I can't help but be reminded of Microsoft FUD surrounding OS/2's SIQ issue that persisted even years after it had been fixed. Does this not feel somewhat like sweet, sweet karma for MS from my perspective? Maybe, but honestly, that doesn't help anyone.

Hopefully a fix will be out soon, and unlike the OS/2 days, the memory of this bug will be short lived.
Source: randomascii Wordpress Blog
Add your own comment

107 Comments on Windows 10 Process-Termination Bug Slows Down Mighty 24-Core System to a Crawl

#101
BiggieShady
HopelesslyFaithfuli googled gomacc.exe and nothing really comes up for it.
Goma is a distributed compiler infrastructure used by buildbots, and gomacc.exe is GOMA C Compiler executable
Posted on Reply
#102
Boosnie
FordGT90ConceptSpawning lots of processes is just developer laziness relying on the operating system for locks and catching appcrashes instead of doing it yourself (at a huge performance boost).
I strongly disagree.
As a distributed compiler that can run a build on hundreds of networked machine the GOMA compiler team had to chose the same old viable solution between Time/cost/performance.
you can spawn build task on networked machines, you can spawn tasks on you local machine, all with the lowest possible effort and cost.
I don't call it lazyness, I call it software engeneering.
Posted on Reply
#103
de.das.dude
Pro Indian Modder
ever since i got windows 10, my mouse ramdomly stutters :( Havent played war thunder in ages :'(
Posted on Reply
#104
FordGT90Concept
"I go fast!1!11!1!"
BoosnieI strongly disagree.
As a distributed compiler that can run a build on hundreds of networked machine the GOMA compiler team had to chose the same old viable solution between Time/cost/performance.
you can spawn build task on networked machines, you can spawn tasks on you local machine, all with the lowest possible effort and cost.
I don't call it lazyness, I call it software engeneering.
And there's absolutely no reason why a single process per machine can't accomplish the same thing at substantially less memory load.
Posted on Reply
#105
Boosnie
Because you have an already working open source compiler, you develop a build manager, add a layer to the compiler to respond to commands and you have your distributed build manager in a year instead of 5?
Posted on Reply
#106
FordGT90Concept
"I go fast!1!11!1!"
As I said, lazy developers.

It's 50/50 on Microsoft changing anything because there is definitely a design flaw in the software this guy is using.
Posted on Reply
Add your own comment
Nov 23rd, 2024 09:12 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts