# Story Time: Linux, AMDGPU, and CIK (GCN 1.1,) parts.



## Aquinus (May 12, 2018)

After over two years of using Linux with my 390, one of the most painful chapters is coming to a close and I felt that a story would be fitting. When I started solely using Ubuntu I had enough of Windows. It had updated twice without asking and resulted in a bricked system. The last straw was when the second time, it left the NTFS partition on my RAID 5 in an "unclean" state and prevented even Linux from booting because of how my /etc/fstab was configured and I said, even is enough, thus started my two year long adventure which consisted of a hackishly stable system to varying degrees until today, most of which revolved around GPU drivers... all of them no less: AMDGPU, AMDGPU-Pro, and the Radeon drivers.

For two years, I constantly was having to do things like hacking together a weird set of kernel setting parameters to prevent certain things from happening, such as preventing clock scaling from working because most of my issues revolved around memory clocks and the lowest core clock setting (300Mhz.) It would cause artifacts and eventually (very quickly,) would crash the machine, unless I ran the radeon driver with the kernel option of "radeon.dpm=0" which killed power saving and performance, or I had to run the AMDGPU-Pro driver, not start any graphics stuff until I could login through a text terminal and run "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level" which got me a stable machine.

After about 9 months to a year of doing this, Ubuntu started rolling out the HWE kernels which were newer kernels for LTS installations. This enabled me to use just the open source AMDGPU driver which was important because kernel updates were constantly breaking my installation of AMDGPU-Pro and all I needed to do was provide another couple of linux kernel settings on boot: "radeon.cik_support=0" and "amdgpu.cik_support=1" but, I still had to force the performance level to high to keep it stable and I did this for about a year. I started using oibaf's bleeding edge mesa packages as well and the GPU performed great all things considered.

...then in the last couple weeks I decided to upgrade to the next LTS, Ubuntu 18.04. Not only did the upgrade destroy my installation (to the best of my knowledge at the time although, it really wasn't in retrospect,) it didn't seem to want to boot at all. So I re-installed with a clean slate the same version, 18.04 and surely enough I had hit-or-miss luck with the machine booting... about 5-10% of the time it would boot properly, otherwise it would black screen and not even boot. Even worse, running multiple monitors would crash the machine and make the motherboard think a stick of DRAM was bad (WHAT?!) This pushed me to the edge to the point where I was actually starting to hack the kernel and try and build a version that worked for me, with little success... until last night/this morning when I tried the already built mainline kernel for 4.17-RC4 (latest build in Ubuntu's archive.)

So, I got my hands on the 4.17-RC4 mainline kernel and by itself, still didn't work better by itself but, after adding yet another couple kernel options, my machine literally became completely stable overnight. So, between a bleeding edge kernel and these kernel boot options: "radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dc=1 amdgpu.dpm=1" all of my GPU woes went away. GPU clock scaling was working again which means idle usage dropped by almost 80-watts with one display (because memory will clock down,) or 40-watts with more than 1 (because memory doesn't clock down with more than 1 display,) I stopped having to run the "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level" on boot in a text terminal, all 3 displays work flawlessly now, and the machine boots flawlessly every single time now, and best of all, I no longer need to boot to a text terminal to get my machine going anymore.

tl;dr: If you're using a GCN 1.1 part and are having issues in Linux, I highly suggest using the latest 4.17 kernel while using the following kernel options, you'll be glad you did:

```
radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dc=1 amdgpu.dpm=1
```


----------



## kruk (May 12, 2018)

First, thank you for sharing this tip.

It's really amazing how much problems high end AMD GPUs seem to have on Linux in contrast to the low end.

I have been using the opensource driver for several years now on pre GCN, GCN 1.0 (Cape Verde), GCN 1.1 (Kabini, Bonaire) and GCN 1.4 (Polaris) and it's been a really smooth ride. One or two monitors, it really doesn't make any difference for any of the mentioned configs. I also didn't have to change a thing or recompile anything, everything works out of the box (Debian, Manjaro).

In these years, I don't really remember when something went seriously wrong and the GPU driver was to blame. Yes, sure, you can get occasional glitches (interestingly, almost exclusively in Firefox) on some kernel versions, but they are usually fixed pretty fast.


----------



## GoldenX (May 12, 2018)

Seems painful. I knew 1.1 was a pain but didn't expect it to be that bad, I have an 1.0, so I never had a problem with Radeonsi or amdgpu.
Confirms that the only stable drivers are truly Nvidia and Intel.


----------



## Aquinus (May 12, 2018)

kruk said:


> It's really amazing how much problems high end AMD GPUs seem to have on Linux in contrast to the low end.


I know, right? It seems to stem from differences between implementations or at least the problem I was having was as there were some 390(X)s that worked fine and others that were as unstable as could be.


GoldenX said:


> Seems painful. I knew 1.1 was a pain but didn't expect it to be that bad, I have an 1.0, so I never had a problem with Radeonsi or amdgpu.
> Confirms that the only stable drivers are truly Nvidia and Intel.


Don't get me wrong, when amdgpu works, it works pretty well. The problem is that I've never encountered a small issue with it. In fact all of this came from the same issue revolving around old DPM code.

For a little bit of background, practically all of my issues revolved around this bug report: https://bugs.freedesktop.org/show_bug.cgi?id=91880


----------



## blobster21 (May 12, 2018)

Hopefully the next kernel update won't break this harmony.


----------



## Aquinus (May 12, 2018)

blobster21 said:


> Hopefully the next kernel update won't break this harmony.


I won't be using regular kernel builds from Ubuntu until HWE for 18.04 starts pushing out 4.17 or newer. It seems like this might have been fixed in 4.16 but for now, I'll probably keep using 4.17-RC4 until I have a reason not to. I may just try out the latest mainline kernel from time to time to see how it goes but, now that it's stable, I don't really want to screw with it unless something gets rolled into the kernel that I care about. For example, 4.15 added Display Core, a new display stack in amdgpu, which added a lot of good stuff. For example, I can now get audio from DisplayPort and HDMI now and that the groundwork has been laid for FreeSync support to get rolled into X or Wayland. Things like that will push me to upgrade.

Edit: It appears that I inherited a "crash on S3 sleep," bug with the new DC code which is rather unfortunate but, isn't as bad of an issue to have.

Edit 2: It also appears that not using a HDMI to DVI adapter for one of my 1080ps seems to let displays safely go to sleep.


----------



## johnspack (May 14, 2018)

Heh,  never did get amdgpupro to work right with my kaveri cpu under ubuntu,  so seems to be an amd thing.  My nvidia driver installs and works great....  I don't I'd want to mess with amd cards under linux,  seems like such a hassle.


----------



## GoldenX (May 14, 2018)

Now that the proper DPM code is in the mainland kernel, it's no longer a problem.
The funny thing is, Nvidia cards are a problem in any other distro upgrading the kernel faster than the propietary Nvidia driver is compatible, so they only work well in Ubuntu and Debian, unless you keep an old kernel on purpose.


----------



## Aquinus (May 14, 2018)

johnspack said:


> Heh,  never did get amdgpupro to work right with my kaveri cpu under ubuntu,  so seems to be an amd thing.  My nvidia driver installs and works great....  I don't I'd want to mess with amd cards under linux,  seems like such a hassle.


I stopped using AMDGPU-Pro the moment I got the open-source driver working because performance wasn't really all that different (sometimes better,) and still let me upgrade the kernel. It's great being able to use a mainline kernel and not have the world go to crap. 


GoldenX said:


> Now that the proper DPM code is in the mainland kernel, it's no longer a problem.
> The funny thing is, Nvidia cards are a problem in any other distro upgrading the kernel faster than the propietary Nvidia driver is compatible, so they only work well in Ubuntu and Debian, unless you keep an old kernel on purpose.


I'm pretty sure that old DPM, non-powerplay, code was the source of about 90% of my issues. Right now the only issue I have is with the 4k display not always switching to 4k on boot which may be related to the DP cable I'm using since the display occasionally complains about "using the cable the came with the display," but I'll tell you a secret... I am.


----------



## Killer_Rubber_Ducky (Jun 28, 2018)

I stopped running Ubuntu/ubuntu-based distros with AMD or Nvidia cards. I stick to Fedora 27/28 because for whatever reason, the kernel is more up to date and the drivers just seem to work.


----------



## Aquinus (Jun 28, 2018)

Killer_Rubber_Ducky said:


> I stopped running Ubuntu/ubuntu-based distros with AMD or Nvidia cards. I stick to Fedora 27/28 because for whatever reason, the kernel is more up to date and the drivers just seem to work.


If you want to use newer kernels you can always install mainline builds. I'm currently running 4.18-rc1.


----------



## GoldenX (Jun 29, 2018)

Install Gentoo®.

The AMD Mesa drivers now have full compatibility profile support, they are now officially ten times better than the Windows drivers.
I have to try Breath of the Wild on them.


----------



## OneMoar (Jun 29, 2018)

this whole saga could have been avoided with 30 seconds with group policy or any one of a dozen 3d party tools


----------



## GoldenX (Jun 29, 2018)

OneMoar said:


> this whole saga could have been avoided with 30 seconds with group policy or any one of a dozen 3d party tools


...
...
...
What?


----------



## MrGenius (Jun 29, 2018)

What do you not understand about that? He knew he didn't like Windows updating itself. Yet he allowed it to happen anyway. Then he goes and complains about how it's not his fault that things got borked from it. And insists it must be Window's fault instead. 

Truth is it's nobody's fault. It's an extreme edge case of incompatibility. Not very likely to happen more than once in a lifetime. Shit happens. Deal with it. But no. Don't accept reality. Just go ahead and ditch Windows. Cut your nose off to spite your face.


----------



## Aquinus (Jun 29, 2018)

MrGenius said:


> What do you not understand about that? He knew he didn't like Windows updating itself. Yet he allowed it to happen anyway. Then he goes and complains about how it's not his fault that things got borked from it. And insists it must be Window's fault instead.


Well, the first time windows update ran, did a major update and borked because it miscalculated the required space to do the update, filled my SSD RAID then flat out failed. It reverted the update (poorly,) but, left the SSDs completely full with temporary files. The second time, drivers got borked so bad that booting, even into safe mode, would cause an error. The second time, the machine would boot albeit unstable but, removing and re-installing AMD's graphics drivers killed the already unstable installation.

Not properly calculating space required for an update is definitely on Windows. The second occurrence was likely just a fluke but, enough was enough. I shouldn't have to consider reinstalling Windows every time there is a major update should I decide (or not decide,) to install it.


MrGenius said:


> Don't accept reality. Just go ahead and ditch Windows. Cut your nose off to spite your face.


I considered having the Windows firewall block Windows update unless I wanted it to run but, I chose Linux instead. It's not cutting off your nose to spite your face when both do what you need them to. In fact, I was dual booting already because I use Linux or a flavor of Unix for my work and I got to a point where the games that were only on Windows were not enough motivation to keep using it. It also happened to be the case that mounting the RAID-5 after Windows decides to crash for any reason makes NTFS partitions not want to mount because of an unclean shutdown flag which impacted my linux enviornment which was also a reason that pushed me this way.

I chose Linux and decided to tell the story, not start a pissing contest. So take your own advice and...


MrGenius said:


> Deal with it.


----------



## znmeb (Sep 7, 2018)

Aquinus said:


> After over two years of using Linux with my 390, one of the most painful chapters is coming to a close and I felt that a story would be fitting. When I started solely using Ubuntu I had enough of Windows. It had updated twice without asking and resulted in a bricked system. The last straw was when the second time, it left the NTFS partition on my RAID 5 in an "unclean" state and prevented even Linux from booting because of how my /etc/fstab was configured and I said, even is enough, thus started my two year long adventure which consisted of a hackishly stable system to varying degrees until today, most of which revolved around GPU drivers... all of them no less: AMDGPU, AMDGPU-Pro, and the Radeon drivers.
> 
> For two years, I constantly was having to do things like hacking together a weird set of kernel setting parameters to prevent certain things from happening, such as preventing clock scaling from working because most of my issues revolved around memory clocks and the lowest core clock setting (300Mhz.) It would cause artifacts and eventually (very quickly,) would crash the machine, unless I ran the radeon driver with the kernel option of "radeon.dpm=0" which killed power saving and performance, or I had to run the AMDGPU-Pro driver, not start any graphics stuff until I could login through a text terminal and run "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level" which got me a stable machine.
> 
> ...



Sadly, these settings work in Fedora 28 with the 4.17 kernel but not with Fedora 29 with the 4.18 kernel. I will never buy an AMD GPU again.


----------



## GoldenX (Sep 8, 2018)

Good luck with the Nvidia driver and Fedora updating the kernel constantly.
The only trouble free solution right now is Intel, like it or not.


----------



## znmeb (Sep 8, 2018)

The Intel GPU on the Omen's not bad, actually. I think it's a 7th generation Core i7 and it runs at least one of the deep learning packages. I'm not a gamer; I bought the thing for training neural nets. ;-)

I live about ten miles from the Intel campus in Oregon where they do all the AI software, and they just bought Vertex.Ai, and they're building some new GPGPU things.


----------



## GoldenX (Sep 8, 2018)

GCN1.1 was the most troublesome, 1.0 can just use radeonsi or works just fine with amdgpu and the boot flags, 1.2 and up just works with amdgpu. 1.1 was in the middle, needing patches for power management and hdmi audio, but receiving the same support as 1.0 (almost none on amdgpu).
Now everything works. Had you used Arch before, you would have saved some time.


----------



## znmeb (Sep 8, 2018)

I'm on Arch on my main partition - running LTS kernel. The Fedora thing is testing Silverblue. If there's a way to get the non-LTS Arch kernel to work with this card I'm all ears!


----------



## GoldenX (Sep 8, 2018)

Just guessing, have you tried mesa-git + llvm-svn?


----------



## znmeb (Sep 8, 2018)

Not recently - the goal was to get OpenCL running and a 1360x768 display. I gave up on Clover because of the raft of open bugs and went with the AMD opencl-only proprietary driver.


----------



## GoldenX (Sep 8, 2018)

There's now the open one based from AMDGPU-PRO.


----------



## Aquinus (Sep 8, 2018)

znmeb said:


> Sadly, these settings work in Fedora 28 with the 4.17 kernel but not with Fedora 29 with the 4.18 kernel. I will never buy an AMD GPU again.


I don't know if it's a Fedora thing but, I've been running 4.18 for a while now without any issues. 4.19-rc1 was a different story though.


----------

