Friday, December 4th 2020
PSA: AMD's Graphics Driver will Eat One CPU Core when No Radeon Installed
While I was messing around with an older SSD test system (not benchmarking anything) I wondered why the machine's performance was SO sluggish with the NVIDIA card I just installed. Windows startup, desktop, Internet, everything in Windows would just be incredibly slow. This is an old dual-core machine, but it ran perfectly fine with the AMD Radeon card I used before.At first I blamed NVIDIA, but when I opened Task Manager I noticed one of my cores sitting at 100%—that can't be right.Digging a bit further into this, it looks like RadeonSettings.exe is using one processor core at maximum 100% CPU load. Ugh, but there is no AMD graphics card installed right now.
Once that process was terminated manually (right click, select "End task"), performance was restored to expected levels and CPU load was normal again. This confirms that the AMD driver is the reason for the high CPU load. Ideally, before changing graphics card, you should uninstall the current graphics card driver, change hardware, then install the new driver, in that order. But for a quick test that's not what most people do, and others are simply not aware of the fact that a thing called "graphics card driver" exists, and what it does. Windows is smart enough to not load any drivers for devices that aren't present physically.Looks like AMD is doing things differently and just pre-loads Radeon Settings in the background every time your system is booted and a user logs in, no matter if AMD graphics hardware is installed or not. It would be trivial to add a check "If no AMD hardware found, then exit immediately", but ok. Also, do we really need six entries in Task Scheduler?
I got curious and wondered how it is possible in the first place that an utility software like the Radeon Settings control panel uses 100% CPU load constantly—something that might happen when a mining virus gets installed, to use your electricity to mine cryptocurrency, without you knowing. By the way, all this was verified to be happening on Radeon 20.11.2 WHQL driver, 20.11.3 Beta and the press driver for an upcoming Radeon review.
Unless you're a computer geek you'll probably want to skip over the following paragraphs, I still found the details interesting enough to share with you.
I attached my debugger, looked for the thread that's causing all the CPU load and found this:Hard to read, translated it into C code it might make more sense:If you're a programmer you'd have /facepalm'd by now, let me explain. In a multi-threaded program, Events are often used to synchronize concurrently running threads. Events are a core feature of the Windows operating system, once created, they can be set to "signaled", which will notify every other piece of code that is watching the status of this event—instantly and even across process boundaries. In this case the Radeon Settings program will wait for an event called "DVRReadyEvent" to get created, before it continues with initialization. This event gets created by a separate, independent, driver component, that's supposed to get loaded on startup, too, but apparently never does. The Task Scheduler entries in the screenshot above do show "StartDVR". The naming suggests it's related to the ReLive recording feature that lets you capture and stream gameplay. I guess that part of the driver does indeed check if Radeon hardware is present, and will not start otherwise. Since Windows has no WaitForEventToGetCreated() function, the usual approach is to try to open the event until it can be opened, at which point you know that it does exist.
You're probably asking now, "what if the event never gets created?" Exactly, your program will be hung, forever, caught in an infinite loop. The correct way to implement this code is to either set a time limit for how long the loop should run, or count the number of runs and give up after 100, 1000, 1 million, you pick a number—but it's important to set a reasonable limit.
A more subtle effect of this kind of busy waiting is that it will run as fast as the processor can, loading one core to 100%. While that might be desirable if you have to be able to react VERY quickly to something, there's no reason to do that here. The typical approach is to add a short bit of delay inside the loop, which tells the operating system and processor "hey, I'm waiting on something and don't need CPU time, you may run another application now or reduce power". Modern processors will adjust their frequency when lightly loaded, and even power down cores completely, to conserve energy and reduce heat output. Even a delay of one millisecond will make a huge difference here.
This is especially important during system startup, where a lot of things are happening at the same time, that need processor time to complete—it's why you feel you're waiting forever for your desktop to become usable when you start the computer. With Radeon Settings taking over one core completely, there's obviously less performance left for other startup programs to complete.
I did some quick and dirty performance testing in actual gameplay on a 8-core/16-thread CPU and found a small FPS loss, especially in CPU limited scenarios, around 1%, in the order of 150 FPS vs 151 FPS. This confirms that this can be an issue on modern systems, too, even though just 5% of CPU power is lost (one core out of 16). The differences will be minimal though, and it's unlikely you'll subjectively notice the difference.
Waiting on synchronization signals is very basic programming skills, most midterm students would be able to implement it correctly. That's why I'm so surprised to see such low quality code in a graphics driver component that get installed on hundreds of millions of computers. Modern software development techniques avoid these mistakes by code reviews—one or multiple colleagues read your source code and point out potential issues. There's also "unit testing", which requires developers to write testing code that's separate from the main code. These unit tests can then be executed automatically to measure "code coverage"—how many percent of the program code are verified to be correct through the use of unit tests. Let's just hope AMD fixes this bug, it should be trivial.
If you are affected by this issue, just uninstall the AMD driver from Windows Settings - Apps and Features. If that doesn't work, use DDU. It's not a big deal anyway, what's most important is that you are aware, in case your system feels sluggish after a graphics hardware change.
Once that process was terminated manually (right click, select "End task"), performance was restored to expected levels and CPU load was normal again. This confirms that the AMD driver is the reason for the high CPU load. Ideally, before changing graphics card, you should uninstall the current graphics card driver, change hardware, then install the new driver, in that order. But for a quick test that's not what most people do, and others are simply not aware of the fact that a thing called "graphics card driver" exists, and what it does. Windows is smart enough to not load any drivers for devices that aren't present physically.Looks like AMD is doing things differently and just pre-loads Radeon Settings in the background every time your system is booted and a user logs in, no matter if AMD graphics hardware is installed or not. It would be trivial to add a check "If no AMD hardware found, then exit immediately", but ok. Also, do we really need six entries in Task Scheduler?
I got curious and wondered how it is possible in the first place that an utility software like the Radeon Settings control panel uses 100% CPU load constantly—something that might happen when a mining virus gets installed, to use your electricity to mine cryptocurrency, without you knowing. By the way, all this was verified to be happening on Radeon 20.11.2 WHQL driver, 20.11.3 Beta and the press driver for an upcoming Radeon review.
Unless you're a computer geek you'll probably want to skip over the following paragraphs, I still found the details interesting enough to share with you.
I attached my debugger, looked for the thread that's causing all the CPU load and found this:Hard to read, translated it into C code it might make more sense:If you're a programmer you'd have /facepalm'd by now, let me explain. In a multi-threaded program, Events are often used to synchronize concurrently running threads. Events are a core feature of the Windows operating system, once created, they can be set to "signaled", which will notify every other piece of code that is watching the status of this event—instantly and even across process boundaries. In this case the Radeon Settings program will wait for an event called "DVRReadyEvent" to get created, before it continues with initialization. This event gets created by a separate, independent, driver component, that's supposed to get loaded on startup, too, but apparently never does. The Task Scheduler entries in the screenshot above do show "StartDVR". The naming suggests it's related to the ReLive recording feature that lets you capture and stream gameplay. I guess that part of the driver does indeed check if Radeon hardware is present, and will not start otherwise. Since Windows has no WaitForEventToGetCreated() function, the usual approach is to try to open the event until it can be opened, at which point you know that it does exist.
You're probably asking now, "what if the event never gets created?" Exactly, your program will be hung, forever, caught in an infinite loop. The correct way to implement this code is to either set a time limit for how long the loop should run, or count the number of runs and give up after 100, 1000, 1 million, you pick a number—but it's important to set a reasonable limit.
A more subtle effect of this kind of busy waiting is that it will run as fast as the processor can, loading one core to 100%. While that might be desirable if you have to be able to react VERY quickly to something, there's no reason to do that here. The typical approach is to add a short bit of delay inside the loop, which tells the operating system and processor "hey, I'm waiting on something and don't need CPU time, you may run another application now or reduce power". Modern processors will adjust their frequency when lightly loaded, and even power down cores completely, to conserve energy and reduce heat output. Even a delay of one millisecond will make a huge difference here.
This is especially important during system startup, where a lot of things are happening at the same time, that need processor time to complete—it's why you feel you're waiting forever for your desktop to become usable when you start the computer. With Radeon Settings taking over one core completely, there's obviously less performance left for other startup programs to complete.
I did some quick and dirty performance testing in actual gameplay on a 8-core/16-thread CPU and found a small FPS loss, especially in CPU limited scenarios, around 1%, in the order of 150 FPS vs 151 FPS. This confirms that this can be an issue on modern systems, too, even though just 5% of CPU power is lost (one core out of 16). The differences will be minimal though, and it's unlikely you'll subjectively notice the difference.
Waiting on synchronization signals is very basic programming skills, most midterm students would be able to implement it correctly. That's why I'm so surprised to see such low quality code in a graphics driver component that get installed on hundreds of millions of computers. Modern software development techniques avoid these mistakes by code reviews—one or multiple colleagues read your source code and point out potential issues. There's also "unit testing", which requires developers to write testing code that's separate from the main code. These unit tests can then be executed automatically to measure "code coverage"—how many percent of the program code are verified to be correct through the use of unit tests. Let's just hope AMD fixes this bug, it should be trivial.
If you are affected by this issue, just uninstall the AMD driver from Windows Settings - Apps and Features. If that doesn't work, use DDU. It's not a big deal anyway, what's most important is that you are aware, in case your system feels sluggish after a graphics hardware change.
277 Comments on PSA: AMD's Graphics Driver will Eat One CPU Core when No Radeon Installed
Nothing bad to write about AMD now they are doing ok, so you have to come up with something like this. True, it's bad coding practice, but it won't affect 99% of PC users. Because even beginners now know to uninstall drivers before swapping the graphics card.
Funny thing is that, whenever AMD releases a new GPU this kind of thhings appear. Looks like AMD did better thing by focusing on Consoles APU, rather than this stitty DIY GPU market.
It's appalling how many ill-informed and shallow people have commented in a disparaging manner at what is a well-considered investigation into a software 'curiosity'. As a member - not a moderator - I'm amazed at the infantile reactions to the piece by W1zzard. There are no scathing critiques from the author, there is no call to arms. What has happened is a technical 'blip' has caused someone to take a closer look. And they found something unusual, rare, but worth posting about.
For the trolls and shitposters that think W1zzard and by extension, TPU, is a biased site, I suggest you look at the reviews for the Zen3 processors and the RDNA2 GPU's. Those reviews are glowing and that is pretty much the sum of AMD's fruit. If there was an anti-AMD agenda, it would be outwardly obvious. I'm sure this post will just rile people up more, but before you do get all antsy and start foaming at the mouth - go back and read the reviews.
TPU is a tech site and the material within should be tech based. When 'blips' happen, it can be good to see what's at the source. It's telling that so many people who rejoiced at Intel's security mishaps (which affected few in the consumer world), are up in arms over this piece. If we all just dropped the childish attitudes to hardware companies and saw tech as a uniform consumer product, we'd have so much less vitriol in the forums.
AMD surely deserve many compliments lately, because of their hardware, but they are terrible in software support. Many reviews are just ignoring that, because it’s not cool to speak against AMD nowadays. It doesn’t matter if their pricing now is worse than Intel and their gpu launch is worse than Nvidia. They are AMD and they must be “the good one”.
I had intel, radeon and nv drivers installed at same time, and swapping gpus frequently, like every hour :)
But that was a b150 mobo with celeron g3930.
edit: the GPU in my sig might be the last time I used one...lol
Lately things improved but it was a nightmare over one year of usage.
You SHOULD be asking yourself how much "fine wine" could be gained by fixing all the sure-to-be-found similar crap in the driver. It could be extraordinary. Me too. My experience (and attempt to help others) with the 5700 XT is well documented here on the forums. In particular, DX11 cpu overhead is absurd. Same.
Unlike the "game ready" WHQL certified Nvidia drivers that are really Betas and need patches and fixes with constant updates.
This situation, simplified down(as best I can manage, @W1zzard feel free to correct me if I've misunderstood things), is a set of instructions intended to perform a function erroring out and failing to truncate the process it's attached to, getting caught in an infinite loop and as a result pegs the CPU it's thread is running on at full processing cycles. This unfortunately continues until the process is terminated by system or user command.
People offering negative comments really need take a step back and think about how complicated things really are and try a bit of understanding. This is an example of something slipping passed those in charge of testing & debugging. Little more.
But hey, it's driving weekend traffic, plus we can shit on RTG drivers, so life's good.
Driver oopsies happens with everyone. Many WHQL Nvidia drivers have Hotfix releases.
I don't know about microcode problems with AMD, what i do know is that, i had to update my Intel chipset firmware yet again, because of 20+ CVE's.
The amount of times i had to update IME firmware because of CVE's is mind-boggling.
You're not the only one who has played with asm. I used to write and make eeprom replacement games for my NES. ;) ...
you know what, never mind.
more news at 11
psa: to the non programmers when you see errors like this its indicative of a programmer not knowing what the fuck they are doing
no good programmer would make this mistake, It leaves one to wonder if they made this error..... What else did they fubar that may be causing unnecessary performance penalty's