Friday, December 4th 2020
PSA: AMD's Graphics Driver will Eat One CPU Core when No Radeon Installed
While I was messing around with an older SSD test system (not benchmarking anything) I wondered why the machine's performance was SO sluggish with the NVIDIA card I just installed. Windows startup, desktop, Internet, everything in Windows would just be incredibly slow. This is an old dual-core machine, but it ran perfectly fine with the AMD Radeon card I used before.At first I blamed NVIDIA, but when I opened Task Manager I noticed one of my cores sitting at 100%—that can't be right.Digging a bit further into this, it looks like RadeonSettings.exe is using one processor core at maximum 100% CPU load. Ugh, but there is no AMD graphics card installed right now.
Once that process was terminated manually (right click, select "End task"), performance was restored to expected levels and CPU load was normal again. This confirms that the AMD driver is the reason for the high CPU load. Ideally, before changing graphics card, you should uninstall the current graphics card driver, change hardware, then install the new driver, in that order. But for a quick test that's not what most people do, and others are simply not aware of the fact that a thing called "graphics card driver" exists, and what it does. Windows is smart enough to not load any drivers for devices that aren't present physically.Looks like AMD is doing things differently and just pre-loads Radeon Settings in the background every time your system is booted and a user logs in, no matter if AMD graphics hardware is installed or not. It would be trivial to add a check "If no AMD hardware found, then exit immediately", but ok. Also, do we really need six entries in Task Scheduler?
I got curious and wondered how it is possible in the first place that an utility software like the Radeon Settings control panel uses 100% CPU load constantly—something that might happen when a mining virus gets installed, to use your electricity to mine cryptocurrency, without you knowing. By the way, all this was verified to be happening on Radeon 20.11.2 WHQL driver, 20.11.3 Beta and the press driver for an upcoming Radeon review.
Unless you're a computer geek you'll probably want to skip over the following paragraphs, I still found the details interesting enough to share with you.
I attached my debugger, looked for the thread that's causing all the CPU load and found this:Hard to read, translated it into C code it might make more sense:If you're a programmer you'd have /facepalm'd by now, let me explain. In a multi-threaded program, Events are often used to synchronize concurrently running threads. Events are a core feature of the Windows operating system, once created, they can be set to "signaled", which will notify every other piece of code that is watching the status of this event—instantly and even across process boundaries. In this case the Radeon Settings program will wait for an event called "DVRReadyEvent" to get created, before it continues with initialization. This event gets created by a separate, independent, driver component, that's supposed to get loaded on startup, too, but apparently never does. The Task Scheduler entries in the screenshot above do show "StartDVR". The naming suggests it's related to the ReLive recording feature that lets you capture and stream gameplay. I guess that part of the driver does indeed check if Radeon hardware is present, and will not start otherwise. Since Windows has no WaitForEventToGetCreated() function, the usual approach is to try to open the event until it can be opened, at which point you know that it does exist.
You're probably asking now, "what if the event never gets created?" Exactly, your program will be hung, forever, caught in an infinite loop. The correct way to implement this code is to either set a time limit for how long the loop should run, or count the number of runs and give up after 100, 1000, 1 million, you pick a number—but it's important to set a reasonable limit.
A more subtle effect of this kind of busy waiting is that it will run as fast as the processor can, loading one core to 100%. While that might be desirable if you have to be able to react VERY quickly to something, there's no reason to do that here. The typical approach is to add a short bit of delay inside the loop, which tells the operating system and processor "hey, I'm waiting on something and don't need CPU time, you may run another application now or reduce power". Modern processors will adjust their frequency when lightly loaded, and even power down cores completely, to conserve energy and reduce heat output. Even a delay of one millisecond will make a huge difference here.
This is especially important during system startup, where a lot of things are happening at the same time, that need processor time to complete—it's why you feel you're waiting forever for your desktop to become usable when you start the computer. With Radeon Settings taking over one core completely, there's obviously less performance left for other startup programs to complete.
I did some quick and dirty performance testing in actual gameplay on a 8-core/16-thread CPU and found a small FPS loss, especially in CPU limited scenarios, around 1%, in the order of 150 FPS vs 151 FPS. This confirms that this can be an issue on modern systems, too, even though just 5% of CPU power is lost (one core out of 16). The differences will be minimal though, and it's unlikely you'll subjectively notice the difference.
Waiting on synchronization signals is very basic programming skills, most midterm students would be able to implement it correctly. That's why I'm so surprised to see such low quality code in a graphics driver component that get installed on hundreds of millions of computers. Modern software development techniques avoid these mistakes by code reviews—one or multiple colleagues read your source code and point out potential issues. There's also "unit testing", which requires developers to write testing code that's separate from the main code. These unit tests can then be executed automatically to measure "code coverage"—how many percent of the program code are verified to be correct through the use of unit tests. Let's just hope AMD fixes this bug, it should be trivial.
If you are affected by this issue, just uninstall the AMD driver from Windows Settings - Apps and Features. If that doesn't work, use DDU. It's not a big deal anyway, what's most important is that you are aware, in case your system feels sluggish after a graphics hardware change.
Once that process was terminated manually (right click, select "End task"), performance was restored to expected levels and CPU load was normal again. This confirms that the AMD driver is the reason for the high CPU load. Ideally, before changing graphics card, you should uninstall the current graphics card driver, change hardware, then install the new driver, in that order. But for a quick test that's not what most people do, and others are simply not aware of the fact that a thing called "graphics card driver" exists, and what it does. Windows is smart enough to not load any drivers for devices that aren't present physically.Looks like AMD is doing things differently and just pre-loads Radeon Settings in the background every time your system is booted and a user logs in, no matter if AMD graphics hardware is installed or not. It would be trivial to add a check "If no AMD hardware found, then exit immediately", but ok. Also, do we really need six entries in Task Scheduler?
I got curious and wondered how it is possible in the first place that an utility software like the Radeon Settings control panel uses 100% CPU load constantly—something that might happen when a mining virus gets installed, to use your electricity to mine cryptocurrency, without you knowing. By the way, all this was verified to be happening on Radeon 20.11.2 WHQL driver, 20.11.3 Beta and the press driver for an upcoming Radeon review.
Unless you're a computer geek you'll probably want to skip over the following paragraphs, I still found the details interesting enough to share with you.
I attached my debugger, looked for the thread that's causing all the CPU load and found this:Hard to read, translated it into C code it might make more sense:If you're a programmer you'd have /facepalm'd by now, let me explain. In a multi-threaded program, Events are often used to synchronize concurrently running threads. Events are a core feature of the Windows operating system, once created, they can be set to "signaled", which will notify every other piece of code that is watching the status of this event—instantly and even across process boundaries. In this case the Radeon Settings program will wait for an event called "DVRReadyEvent" to get created, before it continues with initialization. This event gets created by a separate, independent, driver component, that's supposed to get loaded on startup, too, but apparently never does. The Task Scheduler entries in the screenshot above do show "StartDVR". The naming suggests it's related to the ReLive recording feature that lets you capture and stream gameplay. I guess that part of the driver does indeed check if Radeon hardware is present, and will not start otherwise. Since Windows has no WaitForEventToGetCreated() function, the usual approach is to try to open the event until it can be opened, at which point you know that it does exist.
You're probably asking now, "what if the event never gets created?" Exactly, your program will be hung, forever, caught in an infinite loop. The correct way to implement this code is to either set a time limit for how long the loop should run, or count the number of runs and give up after 100, 1000, 1 million, you pick a number—but it's important to set a reasonable limit.
A more subtle effect of this kind of busy waiting is that it will run as fast as the processor can, loading one core to 100%. While that might be desirable if you have to be able to react VERY quickly to something, there's no reason to do that here. The typical approach is to add a short bit of delay inside the loop, which tells the operating system and processor "hey, I'm waiting on something and don't need CPU time, you may run another application now or reduce power". Modern processors will adjust their frequency when lightly loaded, and even power down cores completely, to conserve energy and reduce heat output. Even a delay of one millisecond will make a huge difference here.
This is especially important during system startup, where a lot of things are happening at the same time, that need processor time to complete—it's why you feel you're waiting forever for your desktop to become usable when you start the computer. With Radeon Settings taking over one core completely, there's obviously less performance left for other startup programs to complete.
I did some quick and dirty performance testing in actual gameplay on a 8-core/16-thread CPU and found a small FPS loss, especially in CPU limited scenarios, around 1%, in the order of 150 FPS vs 151 FPS. This confirms that this can be an issue on modern systems, too, even though just 5% of CPU power is lost (one core out of 16). The differences will be minimal though, and it's unlikely you'll subjectively notice the difference.
Waiting on synchronization signals is very basic programming skills, most midterm students would be able to implement it correctly. That's why I'm so surprised to see such low quality code in a graphics driver component that get installed on hundreds of millions of computers. Modern software development techniques avoid these mistakes by code reviews—one or multiple colleagues read your source code and point out potential issues. There's also "unit testing", which requires developers to write testing code that's separate from the main code. These unit tests can then be executed automatically to measure "code coverage"—how many percent of the program code are verified to be correct through the use of unit tests. Let's just hope AMD fixes this bug, it should be trivial.
If you are affected by this issue, just uninstall the AMD driver from Windows Settings - Apps and Features. If that doesn't work, use DDU. It's not a big deal anyway, what's most important is that you are aware, in case your system feels sluggish after a graphics hardware change.
277 Comments on PSA: AMD's Graphics Driver will Eat One CPU Core when No Radeon Installed
>>Looks like AMD is doing things differently and just pre-loads Radeon Settings in the background every time your system is
>>booted and a user logs in, no matter if AMD graphics hardware is installed or not. It would be trivial to add a check
>>"If no AMD hardware found, then exit immediately", but ok. Also, do we really need six entries in Task Scheduler?
It is a well known issue! AMD is one of the companies that secretly (!) creates an entry in Windows Task Scheduler to load "RadeonSettings.exe" every time a Windows OS starts. It actually violates your privacy.
A solution is very simple: Disable it in Windows Task Scheduler or delete. Personally, I disable it and check regularly that it is Not enabled again.
But I don't need "Radeon Settings" anyway — the few tweaks that I might need to do (like custom refresh rate) can be done with other less intrusive software (CRU, for example).
I'm more of an IDA user than Hex-Rays, but yes. Back in the days reverse engineering was the only way to figure out how to make GPU-Z work. Nowadays most vendors have some kind of API, that's more of less useful. Still an extremely useful skill
Who is writing RTG Graphics driver nowadays? I remember reading somewhere that almost all of Radeon's current driver at least coming from AMD Shanghai.
It could just be the inexperience of AMD China team that is causing so many problems like this.
OK found it, from one of AMD's Sr. Director of Software Engineering, Zhengsan Jian
So looks like AMD's GPU driver are Made in China after all. Damn
Author: Zhengsan Jian
Link: www.zhihu.com/question/24684566/answer/29352184 Source: Zhihu The copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
I have been doing graphics driver development for many years at AMD. In fact, I came from ATI that year. AMD's shadow is almost invisible in the code. Most of the binary files are still ati*.dll. AMD is neither a large company nor a small company. There are many branches in the world. Basically, there are people in other continents except Africa and South America. Of course, the North/Arctic is not counted. In such an international company, the perceptions of people in different positions and departments must be quite different. My opinion is for your reference only. As far as the Shanghai R&D Center is concerned, from a technical point of view, GPU driver is very interesting work, anyway, I enjoy it. This is a very narrow field, and the learning curve is very steep. It usually recruits graduates of master's degree from prestigious schools. I have seen that I can't touch it after half a year. When writing an app, you can always Google some clues, but in the field of drivers, Google has no help. The driver works with the OS and requires high stability. It is often necessary to debug the kernel of the OS, and Microsoft bugs can be found from time to time. For those who like to deal with the bottom layer of the system and study a detail (such as the optimization of a structure initialization by the compiler) to the extreme, this is a very suitable field. And in Shanghai AMD GPU research and development work, there is almost no difference between China and the United States. Everyone works on the same source code server and can read all GPU hardware specs, except for some sensitive ones such as video encode/decode related Content, engineers here can see all the driver code. From an operational perspective, for a company with decades of history like AMD, the CEO is no longer the founder. Too many professional managers have the faults of most established companies: low process efficiency and a lot of people who eat too much. . Since Jobs released the iPhone in the mobile Internet revolution, many traditional established companies have had a hard time, including Microsoft, Intel, DELL, HP, SONY and other companies closely related to AMD. Naturally, AMD’s life is also very difficult. But as a technical engineer, the company's hard life does not mean that your life will be hard. As long as the salary can be paid, having an interesting and fulfilling job is actually good.
Another proof
translate.google.com/translate?sl=auto&tl=en&u=https://www.expreview.com/32270.html
I wonder when did AMD out sourced their driver team out.
Just remove the driver before changing the GPU ?
I recall one serious bug in certain intel's package - when there was no related iGPU present, the dll that hooked to explorer.exe caused whole process to crash.
After the explorer crashed, the OS kept restarting the process just to let it crash again, and... you know what's next ;)
I've also seen nvidia's software hog the cpu when there's no card - mainly noticable as stutters during games.
This strikes me as one of the "our software is too slow, we need to cache some things on system startup so the user will blame Microsoft" type of situations. Consequently, the most junior (so cheapest) intern is tasked with writing "the caching thing" in half a day and it "kinda, sorta maybe works most of the time usually" so everyone is happy.
Forgetting to uninstall a whole driver suite after a hardware change, lead to problems?
Who would have thought?!?!
Next feature in NVidia driver... detect unused AMD software.
It seems like pretty specialized subject, not something for your average programmer. Nvidia probably hoard them all anyway.:p
Hence why i think finding good graphics programmers is not easy.