# Strange "crashes" in Ubuntu



## Sasqui (Dec 3, 2017)

System in question:
Ubuntu 16.04.3 LTS
ASUS P6X58D-E 
x5670 With Scythe Ashura Cooler
DDR3 Ram (tried 6Gb & 12Gb)
GTX 220 Video Card
LG Optical Drive
2.5" 320GB OS HDD
Seasonic 650w PSU and Corsair CX430w PSU

Machine was running 24/7 WCG Crunching at work over the summer.  Had it at 4.0 Ghz then noticed I'd come into work and the machine would be unresponsive.  I dropped the OC back to stock, but the problem persists.

Here's what's happening now:

Machine turns on fine, boots up Ubuntu, starts running, temp monitoring program shows cores no more than 50c and 100% load.  I walk away and come back a few minutes later, say bout 10 min.  The monitor shows no signal...  I hit the mouse or ESC on the keyboard, the signal comes back, but all I see is a black screen with a movable cursor, nothing else!  Let is sit for a minute, screen signal goes away, I do the same thing (move mouse/hit keyboard) and get the same results, cannot get back to the desktop.

What I've tried:

Swapped the CX430 PSU with a Seasonic 650W gold PSU - same results
Took out all 3 RAM sticks (G.Skill Ripjaws 3x2GB) and replaced with G.Skill Sniper 3x4GB - same results

I am wondering if it's something to do with the video card / drivers?  I recall installing drivers for the GTX 220, but I have no idea how to check...  is there a way?  I have an HD 6670 I'd like to try swapping, but I want to make sure I remove the GTX drivers first.

All help appreciated, I want to get this rig back to crunching...


----------



## blobster21 (Dec 3, 2017)

If you were to determine if it's WCG crunching which is the cause of this black screen, you should reboot your computer and let it run for an hour without doing anything on it (ie no crunching either)

If after an hour the desktop comes back and is still snappy, then your previously unresponsive desktop had something to do with WCG crunching.


----------



## Sasqui (Dec 3, 2017)

blobster21 said:


> If you were to determine if it's WCG crunching which is the cause of this black screen, you should reboot your computer and let it run for an hour without doing anything on it (ie no crunching either)
> 
> If after an hour the desktop comes back and is still snappy, then your previously unresponsive desktop had something to do with WCG crunching.



Good point, will try


----------



## Aquinus (Dec 3, 2017)

If it's coming up with only a mouse cursor and nothing else, that could mean that the window manager (Unity,) crashed. If things like num lock can still be toggled on and off, the kernel hasn't crashed. Try opening one of the TTYs to get a terminal and see if everything is still going in the background. If you press something like ctrl + alt + f3, that should get you a terminal. If you can login, try running:


```
sudo service lightdm restart
```

That might get your desktop back but, it's likely to kill the borked session. This could have to do with X, Unity, or any stage of the graphics pipeline. You should be able to just drop the Radeon in without a problem without uninstalling the nVidia drivers. Linux is far more forgiving than Windows is when it comes to having several graphics drivers installed at once because the kernel figures out which driver to use on boot. In fact the radeon driver is undoubtedly installed and ready to be used already. The kernel will detect this on boot so it's really NBD.

I would suggest looking of the logs in "/var/log/syslog" to see if anything stands out as well.

If the screen is locked up or becomes locked up when you try to switch TTYs, the kernel may have had a "soft" error but hadn't panicked so, it could be in a really odd state but, there is no telling if that's the case without checking syslog.


----------



## silentbogo (Dec 3, 2017)

Sasqui said:


> all I see is a black screen with a movable cursor, nothing else!


That's the hint that it's a software problem, not a hardware.
Maybe you've installed some updates and it broke the lock screen in Unity. I've seen it a few times.
Last weird encounter was a broken backlight control on an ASUS laptop running 14.04. Had to fix it by hand (udev rules + some tweaking).


----------



## Sasqui (Dec 4, 2017)

silentbogo said:


> That's the hint that it's a software problem, not a hardware.



That's my guess and I'm leaning towards the GPU...  I'm thinking an update caused it.

How do I check for NVidia drivers?


----------



## Aquinus (Dec 4, 2017)

Sasqui said:


> That's my guess and I'm leaning towards the GPU...  I'm thinking an update caused it.
> 
> How do I check for NVidia drivers?


If you're using the closed-source driver, updating the kernel could cause it to break but, then I wouldn't expect you to get a window manager at boot, so I doubt this is it. I've experienced this kind of problem occasionally with the closed source AMDGPU-Pro drivers but, almost never with the open source drivers.

When this happens again, I would do as I suggested in #4 and try restarting lightdm from the terminal if you can get to a TTY. That could indicate an issue with Unity or X.

Edit: Side note, are you using padoka or oibaf or the HWE (or HWE edge,) kernel?


----------



## Sasqui (Dec 4, 2017)

Aquinus said:


> When this happens again, I would do as I suggested in #4 and try restarting lightdm from the terminal if you can get to a TTY. That could indicate an issue with Unity or X.



I will try that, I don't know the keypress to get to the terminal in a state like that.  Hell, I tried ctrl-alt-delete 

So I'll try your suggestion (it's easy).  the problem is very reproducible, all I have to do is boot up and wait about 5-10 min...

On another note, I tried enabling screen saver after all this, and some things seemed "wonky" when I messed with that, leading me to the possibility it was the graphics.



Aquinus said:


> If you press something like ctrl + alt + f3, that should get you a terminal. If you can login, try running:


----------



## Aquinus (Dec 4, 2017)

For what it's worth, if you're using the machine just for serving and/or crunching, I would highly suggest running the machine headless without a front-end. That's what I'm doing with the 3820 in the attic. It's a clean server installation with BOINC. That's it. I just use the BOINC interface on my tower and connect to the other machine remotely if I don't just login via SSH.

What's nice is that even with BOINC running and docker without any containers active, it is only eating 580MB/7.73GB of available memory after running for 7 days. Fewer processes and threads also means less time spent scheduling so, performance should be a little better as well.


----------



## Sasqui (Dec 4, 2017)

Aquinus said:


> For what it's worth, if you're using the machine just for serving and/or crunching, I would highly suggest running the machine headless without a front-end. That's what I'm doing with the 3820 in the attic. It's a clean server installation with BOINC. That's it. I just use the BOINC interface on my tower and connect to the other machine remotely if I don't just login via SSH.
> 
> What's nice is that even with BOINC running and docker without any containers active, it is only eating 580MB/7.73GB of available memory after running for 7 days. Fewer processes and threads also means less time spent scheduling so, performance should be a little better as well.



I like the idea.  First, I want to troubleshoot through this issue.  

My second crunching rig is an EVGA x58 that never had a hiccup.  Problem is that my primary Win rig is on the sidelines, so I'm now using the EVGA for games and fun.

I was tempted to take a spare HDD, load unactivated windows and stress test the hell out of it and see if I find anything... or just simply re-install Ubuntu.  I grow weary of troubleshooting.  Too many PC's.  Did i just say that???


----------



## Aquinus (Dec 4, 2017)

Sasqui said:


> I like the idea.  First, I want to troubleshoot through this issue.
> 
> My second crunching rig is an EVGA x58 that never had a hiccup.  Problem is that my primary Win rig is on the sidelines, so I'm now using the EVGA for games and fun.
> 
> I was tempted to take a spare HDD, load unactivated windows and stress test the hell out of it and see if I find anything... or just simply re-install Ubuntu.  I grow weary of troubleshooting.  Too many PC's.  Did i just say that???


My USB wifi device used to take my machine down every time I overclocked about 30 minutes into booting but, could last over a day without an overclock (sometimes.) In the end, I stopped using the usb device and it's been rock solid since. So, don't assume it's hardware. Let the logs (like what's in /var/log/syslog,) guide you.


----------



## Sasqui (Dec 4, 2017)

Aquinus said:


> My USB wifi device used to take my machine down every time I overclocked about 30 minutes into booting but, could last over a day without an overclock (sometimes.) In the end, I stopped using the usb device and it's been rock solid since. So, don't assume it's hardware. Let the logs (like what's in /var/log/syslog,) guide you.



It was on a Panda Wifi dongle, but when I brought it home, plugged it right to my main switch via Cat6.  No difference.


----------



## Sasqui (Dec 17, 2017)

Update!  Both Crunching PCs started doing the same thing, the second one after I let Ubuntu update.

Solution (and maybe there's a better one).... disable the screen off/ lock in system settings.  Both rigs are now humming away no problems at all for 14+ hours.

....Yeah!!!


----------



## thebluebumblebee (Dec 29, 2017)

Sasqui said:


> How do I check for NVidia drivers?


I'm going to use Mint terms, but since it's built on Ubuntu, it should be very similar:
System Settings - Administration - Device Drivers pulls up the Driver Manager. (BTW, thanks.  If I had not tried to answer this, I would not have seen that it appears that somehow this system that I SS from went back to the default Linux drivers, which should be hurting my F@H PPD.  After this WU is done, I'm going to switch over to the Nvidia drivers)


----------



## Sasqui (Dec 30, 2017)

thebluebumblebee said:


> I'm going to use Mint terms, but since it's built on Ubuntu, it should be very similar:
> System Settings - Administration - Device Drivers pulls up the Driver Manager. (BTW, thanks.  If I had not tried to answer this, I would not have seen that it appears that somehow this system that I SS from went back to the default Linux drivers, which should be hurting my F@H PPD.  After this WU is done, I'm going to switch over to the Nvidia drivers)



Will look when I get back to work on Monday... tho I may leave well enough alone.  Also just checked my charts and it's looking like one machine may be down*

*I don't think I disabled auto-updates.


----------

