# Strange RAID issues.



## Aquinus (Apr 27, 2012)

Hi everyone,

Lately I started having issues with my RAID, it has started ever since I moved my RAID to my LGA2011 rig. It is kicking drives out, at first it kicked my WD green out of the RAID, so I removed it, bought a WD black to replace it and popped in and boom, quick and simple rebuild. A couple days later one of my Hitachi drives gets kicked out of the RAID. So I'm skeptical, I rebuild and it works fine for the next couple days. So I took the WD green drive I took out to work and plugged it into one of our servers with an eSATA dock and I run SMART on it. 26000 hours of time active and not a single error or any smart attribute out of whack. So I bring it back home and when the hitachi "failed" again, I took it out and switched drives (I tried changing ports in-between as well, but it didn't seem to make a difference.) So I bring the Hitachi drive to work and check the SMART logs on that drive, 3 relocated sectors was the worst I could see, and 23000 hours of run time.

With all of this said now, Intel Rapid Storage enterprise will say "SATA array disk removed." then it says, "Unknown disk on Controller 0, Port Unknown Detected.", Then "SATA Disk on Controller 0, Port 3 Detected." the right back to "SATA array disk removed." (It repeats this over and over and over again until you restart.)

If I try to re-add the disk to the RAID right now, it will error out. If I restart and use either the RAID BIOS or the Utility in Windows it will re-add it fine and start rebuilding.

Has anyone else noticed this behavior? This never happened to me on the nVidia RAID, and I'm wondering if anyone else has something to add. I might also add that I have two Corsair Force GT 120Gbs in RAID-0 on the SATA6 ports and I haven't had a single issue with that, so I'm not inclined to believe that it is the driver (it could, I mean there is data parity being calculated data gets checksumed.) I also replaced all of the SATA cables as well.

Right now the drives in RAID are (in this order from port 3 to port 5) are:
Hitachi 1
WD Black
WD Green

with Hitachi 2 on my desk in a static proof bag. WD Green and Hitachi 2 both exhibit the same behavior, but the other two drives appear fine, even though Hitachi one is from the same batch as 2 and should have the same exact active time. The drives are getting old but that isn't reason enough for me to get rid of them and buy another new 1Tb drive.


----------



## newtekie1 (Apr 27, 2012)

I say try new cables, particularly try to use cables that don't have the locks on the connectors.  I've had issues before with boards that use the 90° SATA connectors and those locking style cables, where the SATA ports on the board didn't have the little notches cut out for the lock, and they would wiggle loose.  From what I can see on the P9X79, the SATA 3Gbps ports don't have the notches for the locks, so my guess would be the cables are wiggling loose.

Also, in the end I'd try avoiding using the WD Green drive is possible, even the Black might give you issues.  Since WD killed the TLER option on their drives they have had issues when used in RAID, the controller can in-accurately mark the drive as bad.  It doesn't happy often, but it can happen.  I doubt this is the issue you are having here, due to the Hitachi drive having issues too, but it is just something to be aware of.


----------



## Aquinus (Apr 27, 2012)

newtekie1 said:


> I say try new cables, particularly try to use cables that don't have the locks on the connectors.  I've had issues before with boards that use the 90° SATA connectors and those locking style cables, where the SATA ports on the board didn't have the little notches cut out for the lock, and they would wiggle loose.  From what I can see on the P9X79, the SATA 3Gbps ports don't have the notches for the locks, so my guess would be the cables are wiggling loose.





Aquinus said:


> I also replaced all of the SATA cables as well.



The ones before didn't have the latches, changing the cables didn't appear to have any discernible effect.


----------



## newtekie1 (Apr 27, 2012)

Aquinus said:


> The ones before didn't have the latches, changing the cables didn't appear to have any discernible effect.



Well, there goes my idea.


----------



## Aquinus (Apr 27, 2012)

Once I get a little extra money, I think I might get another case and bring my Phenom II 940 back to life and put my RAID back on there. Maybe run Ubuntu Server and just use a samba share for files. I just don't have a whole lot of room for it though.


----------



## Duekay (Apr 28, 2012)

I tend to try and use the same make and model in my server, I use 4 of the Samsung green 2TB drives in raid10 but then again that's on a highpoint card, all my SSD array are on the chipset though and they are rock solid.
I think it might be a driver issue is well, you could try uninstalling RST, I I remember correctly you don't need RST to make a chipset array you can just do it in the bios? Not sure on that though


----------



## Aquinus (Apr 28, 2012)

Duekay said:


> I tend to try and use the same make and model in my server



If I had money, they would all be WD Caviar Blacks. Ideally I would too, but I've been waiting for drive prices to settle a bit more.



Duekay said:


> I think it might be a driver issue is well, you could try uninstalling RST,



I did. It didn't help, granted the driver is as old as the release date for X79. 



Duekay said:


> I I remember correctly you don't need RST to make a chipset array you can just do it in the bios? Not sure on that though



Not mine. The Intel server boards we have at work let you switch between the RST bios and an LSI bios, but that could just be because they have hardware LSI raid cards in them, not because it has the ability built in to use it. Let me tell you though, LSI makes damn good RAID cards. Put a BBU on one of their cards and turn write caching on and you're living the good life. It's too bad you have to dish out over 300 USD for one though (and that is just the 4-port one!), then you have to give up 8 PCI-E lanes for it as well. 5-disk RAID-5 with 500Gb WD Blacks will turn out over 400Mbs/s in RAID-5 with it, which isn't too shabby.


----------



## Steevo (Apr 28, 2012)

Its the head parking and spindown. Disable ASPM in the BIOS and see if that helps, the chipset goes to a low power state and the drives go to low power and it wakes and expects to find the drives ready to go, when they aren't it marks them failed as the time for response passes.

Also disable drive sleep, and PCI Link State Power Management in windows power control panel and see if that fixes it.


----------



## Aquinus (Apr 28, 2012)

Steevo said:


> Its the head parking and spindown. Disable ASPM in the BIOS and see if that helps, the chipset goes to a low power state and the drives go to low power and it wakes and expects to find the drives ready to go, when they aren't it marks them failed as the time for response passes.
> 
> Also disable drive sleep, and PCI Link State Power Management in windows power control panel and see if that fixes it.



I was thinking that something like that might be happening. I'm going to go look through the bios.

So I checked the BIOS and you would be surprised at how little there is with regard to power saving on an X79 board. 

All the Windows settings are as you described already. 

Edit: Found a newer driver, I plopped that on, see if that helps.


----------



## nleksan (Apr 29, 2012)

Just wanted to ask for your further opinion on an LSI HW RAID Controller Card? I am considering using one for a home build to add an extra 4-8 6Gbps SATA Ports (4x SSDs, 4x WD RE4 1TB HDDs, and 4x 2TB WD Cav Black). The Z77/X79 chipsets don't have enough ports for all of the drives I need and I don't want to spend a week trying to get some shitty $30 software RAID Card to work only to find out I am getting a max 200Mbps in RAID10 over a x4/x8 lane... Thanks!


----------



## Aquinus (Apr 29, 2012)

Wow, 8 disk? Not a whole lot of servers even have HDD setups that large. You're looking at more than 500 USD. I think this is what you're looking for as you described. Also keep in mind this is the price without the BBU.

LSI MegaRAID Internal SAS 9265-8i 6Gb/s Dual Core ...

I would also be careful about what you exactly need. Rotational media hard drives won't really benefit from SATA6G, they doesn't go fast enough to really matter and SATA6 on the Z77 PCH should be able to RAID SSDs (it has 4 to 6xSATA6 ports on the PCH, right?) I don't know how well the controller works on Windows, but it worked amazingly on Ubuntu Linux 10.04.3 LTS.


----------



## Deleted member 3 (Apr 29, 2012)

Sounds like TLER. Which makes the issue not strange at all. It's a feature.


----------



## Aquinus (Apr 29, 2012)

DanTheBanjoman said:


> Sounds like TLER. Which makes the issue not strange at all. It's a feature.



I'm not sure why that would cause it to just start happening with my drives now, and only 2 of them for that matter two of which are from the same batch, and only 1 is acting up. I'm not convinced that it is TLER. I've updated the drivers (which were kind of annoying to find on Intel's site I might add they have multiple pages for RSTe drivers,) and so far my RAID has been healthy for a couple days. I'll give it a week since it always seems to do it anywhere between 12 hour after a completing a rebuild to 4 days.


----------



## Deleted member 3 (Apr 29, 2012)

Aquinus said:


> I'm not sure why that would cause it to just start happening with my drives now, and only 2 of them for that matter two of which are from the same batch, and only 1 is acting up. I'm not convinced that it is TLER. I've updated the drivers (which were kind of annoying to find on Intel's site I might add they have multiple pages for RSTe drivers,) and so far my RAID has been healthy for a couple days. I'll give it a week since it always seems to do it anywhere between 12 hour after a completing a rebuild to 4 days.



Because it works like that by design. Until a few generations ago basically any drive supported TLER, so no issues. With the latest generations, however, TLER is only supported in enterprise drives. Basically what happens is that the drive takes too long to recover from an error and the RAID controller decides to drop it. TLER basically cuts off the error recovery after 7 seconds or something along those lines. Some RAID controllers don't have these issues, software arrays tend to be immune as well (ie mdadm). 

I've had this issue as well until recently. I ran an Axus SATA to SCSI disk cabinet with 6 drives, ran fine. I also had a 8 bay eSATA das with a RR622 running 8 green drives which caused huge problems. Large writes cause the controller to hang. Resetting the bus fixed it. Moving large amounts of data to the array was near impossible though.

I ended up finding me a 16 bay SATA to SCSI cabinet (also Axus) for free which now replaces both devices. The firmware of these devices are immune to TLER issues. 
Most NAS devices use mdadm, which is also immune. I couldn't find a decent solution for Windows though. Using all disks in passthrough and then creating a software array would have been fine.


----------



## Aquinus (Apr 29, 2012)

DanTheBanjoman said:


> I've had this issue as well until recently. I ran an Axus SATA to SCSI disk cabinet with 6 drives, ran fine. I also had a 8 bay eSATA das with a RR622 running 8 green drives which caused huge problems. Large writes cause the controller to hang. Resetting the bus fixed it. Moving large amounts of data to the array was near impossible though.



This never happens in write intensive situations in my case, it's always when the disks are spun up and idle. Like I said, I'm waiting on the drivers because a lot of issues in the past that I've had with Intel hardware has been resolved through some form of software, be it a driver or application. We shall see though.


----------



## Deleted member 3 (Apr 29, 2012)

Aquinus said:


> This never happens in write intensive situations in my case, it's always when the disks are spun up and idle. Like I said, I'm waiting on the drivers because a lot of issues in the past that I've had with Intel hardware has been resolved through some form of software, be it a driver or application. We shall see though.



Write intensive operations are the most likely to cause TLER issues. You might be in luck if it doesn't happen in such situations. 

While idle sounds odd though, tried turning off any power saving features? Like spinning down after x minutes?


----------



## Aquinus (Apr 29, 2012)

DanTheBanjoman said:


> Write intensive operations are the most likely to cause TLER issues. You might be in luck if it doesn't happen in such situations.
> 
> While idle sounds odd though, tried turning off any power saving features? Like spinning down after x minutes?



All power saving wrt the hard drives and pci-e is disabled. I'm going to stress it out a bit later to see how stable it really is, but I suspect it was the drivers considering the ones I got are pretty new and the ones I found before were the same ones from release day. Earlier I was going to take screenshots of the bios' options but come to find out that the P9X79 Deluxe has practically nothing in terms of power saving, but I guess that really isn't the goal of an SB-E build, is it? 

It did it again. Maybe the RSTe RAID just doesn't like the drives. Maybe I'll buy a new case for my old Phenom II 940 and turn that into a server and pop the drives in there.


----------



## nleksan (Apr 30, 2012)

Aquinus said:


> Wow, 8 disk? Not a whole lot of servers even have HDD setups that large. You're looking at more than 500 USD. I think this is what you're looking for as you described. Also keep in mind this is the price without the BBU.
> 
> LSI MegaRAID Internal SAS 9265-8i 6Gb/s Dual Core ...
> 
> I would also be careful about what you exactly need. Rotational media hard drives won't really benefit from SATA6G, they doesn't go fast enough to really matter and SATA6 on the Z77 PCH should be able to RAID SSDs (it has 4 to 6xSATA6 ports on the PCH, right?) I don't know how well the controller works on Windows, but it worked amazingly on Ubuntu Linux 10.04.3 LTS.



I am really into HD video and Uncompressed Audio editing and recording, so 1TB can fill up fast. The SATA3 is because, in RAID0 or 10,the throughput of even mech drives should be high enough to take advantage, yes? Or am I misunderstanding the concept?


----------



## Steevo (Apr 30, 2012)

nleksan said:


> I am really into HD video and Uncompressed Audio editing and recording, so 1TB can fill up fast. The SATA3 is because, in RAID0 or 10,the throughput of even mech drives should be high enough to take advantage, yes? Or am I misunderstanding the concept?



You are misunderstanding, each drive has SATA 3 speeds available. 

The connection of the controller is the next saturation point, and it isn't an issue either in that case.

Few if any mechanical hard drives alone could saturate the connections as provided on your board. Most SSD's would have issues saturating any single, or set of  connections in a RAID array. 


If you are working with media files you are doing it wrong. Huge, and almost obscene amounts of RAM, then SSD's with the windows cache left on them, and then Mechanical hard drives to complete the need for massive storage.


----------



## slyfox2151 (Apr 30, 2012)

nleksan said:


> I am really into HD video and Uncompressed Audio editing and recording, so 1TB can fill up fast. The SATA3 is because, in RAID0 or 10,the throughput of even mech drives should be high enough to take advantage, yes? Or am I misunderstanding the concept?



Sata HDDs barely reach 150mps (Sata1) as long as there is enough bandwidth for the controller to run all sata ports at full speed then upgrading controllers to sata2/3 wont help.


----------



## happy (Apr 30, 2012)

Aquinus said:


> If I had money, they would all be WD Caviar Blacks



I thought you can't RAID Caviar blacks?


----------



## Aquinus (Apr 30, 2012)

happy said:


> I thought you can't RAID Caviar blacks?



Tell that to the 20+ Caviar Blacks on our servers at work. Our LSI and 3Ware RAID controllers don't appear to have any issues with them, granted we're not using any of the onboard Intel RAID solutions on our servers which would be must closer to what I'm trying to do. I seriously think that I'm going to resurrect my Phenom II and turn it into a server and rely on nVidia fake-raid since it worked really well with these drives.


----------



## theeldest (May 1, 2012)

Aquinus said:


> Tell that to the 20+ Caviar Blacks on our servers at work. Our LSI and 3Ware RAID controllers don't appear to have any issues with them, granted we're not using any of the onboard Intel RAID solutions on our servers which would be must closer to what I'm trying to do. I seriously think that I'm going to resurrect my Phenom II and turn it into a server and rely on nVidia fake-raid since it worked really well with these drives.



How old are the Blacks?

Western Digital only made the change somewhat recently. I have a bunch of caviar blues in RAID that work great because they have TLER enabled. WD made the change to push people to the RE series if you are doing RAID.


----------



## Aquinus (May 1, 2012)

theeldest said:


> How old are the Blacks?
> 
> Western Digital only made the change somewhat recently. I have a bunch of caviar blues in RAID that work great because they have TLER enabled. WD made the change to push people to the RE series if you are doing RAID.



some are 4 years old, some are two. Nothing newer than 2010.

If this is a recent change, my 4 year old drives shouldn't even be having an issue with this.


----------



## theeldest (May 1, 2012)

Ok, that's stranger. I'm running 4x WD 6400AAKS (caviar blue) drives in RAID10 on the Intel chipset (z68).

These drives are only about 3 years old. If these don't have problems I doubt it's specific to the drives and is most likely a driver problem.


Sorry, really not sure what else you could try.


----------



## Easy Rhino (May 1, 2012)

Aquinus said:


> I seriously think that I'm going to resurrect my Phenom II and turn it into a server and rely on nVidia fake-raid since it worked really well with these drives.



if you are going to install linux then dont bother with the BIOS raid. ubuntu, centos, etc all provide very good software raid solutions without having to muck with the bios.


----------



## Aquinus (May 1, 2012)

Easy Rhino said:


> if you are going to install linux then dont bother with the BIOS raid. ubuntu, centos, etc all provide very good software raid solutions without having to muck with the bios.



That doesn't help me with my problem now. I'm saying that I'll move my raid back to my Phenom II hardware if I can't get X79 to handle my drives correctly. I also don't trust software RAID. It leaves your /boot vulnerable where fakeraid using dmraid can actually boot a RAID device as /boot. I've done this in the past, but honestly, I don't like having multiple copies of /boot all over the place to make sure that all 4 drives has a copy of /boot in case any one dies.


----------



## Easy Rhino (May 1, 2012)

well you mentioned maybe running ubuntu server with a samba setup. if that is the case ubuntu has really strong software raid that is faster and more effecient than the older onboard raid controllers.


----------



## Steevo (May 3, 2012)

As long as your CPU remains unbound.


Windows allows for hard drive configuration as well. 

http://www.howtogeek.com/howto/36504/how-to-create-a-software-raid-array-in-windows-7/


Simple, plus windows manages it.


----------



## Aquinus (May 3, 2012)

Steevo said:


> As long as your CPU remains unbound.
> 
> 
> Windows allows for hard drive configuration as well.
> ...



Then my RAID-0 with my SSDs fails and my RAID-5 dies with it since Windows is gone. Windows RAID is a terrible idea, I like the idea of Linux sofware raid better.  I think these drives are going back into the Phenom II system. Next time I refresh my RAID, I'll just get a bundle of WD Blues. Now if only 3x1tb drives could cost less than 200 USD.


----------



## Aquinus (May 6, 2012)

So I ended up turning off this option on the RAID driver called "Patrol read" and basically the raid controller uses idle time to search the drives for errors. I disabled it and so far so good.



			
				Intel said:
			
		

> "Patrol Read" is a user definable option available in the Intel® RAID Web Console 2 that performs drive reads in the background and maps out any bad areas of the drive.
> 
> Patrol read checks for physical disk errors that could lead to drive failure. These checks usually include an attempt at corrective action. Patrol read can be enabled or disabled with automatic or manual activation.
> 
> ...


----------

