Sunday, February 18th 2007
Google Claims Hard Drives Don’t Fail Because of High Temperature or Usage
One common argument in the world of computing is that high temperatures make hard drives more likely to fail, and the same is said for high usage levels. However, when conducting internal research, search giant Google suggests that this is only true for hard drives in their first months of operation or once they are over five years old. According to Google, there are so many other variables that the biggest factor in the lifetime of hard drives is actually the model itself rather than the conditions. In fact, Google saw a trend suggesting that drives are more likely to fail at lower temperatures or extremely high temperatures, and generally speaking, hard drives failed less as temperature increased (until these extremes were reached). As for high usage, the research showed that hard drives only seem to be affected by high usage in the first months or after it is five years old, and other than that the rate of failure was the same as drives in low-usage environments.
Source:
TG Daily
40 Comments on Google Claims Hard Drives Don’t Fail Because of High Temperature or Usage
(this is based on my ~10 years spent crying with every GB of data i lost ;) )
What is the warranty on new drives?
I see this all the time at the company I work for...My company would rather not spend the little extra for the correct amount of ram , but they don't seem to mind at all replacing hard drives at an alarming rate.
* Moving the pagefile.sys to ANOTHER DISK, can help!
(Also, on your logic & train-of-thought? Fragmentation will also contribute to excessive head movements as well, adding additional & unneeded "wear & tear" as well!)
:)
APK
In my current rig i've had it for like..2-3 years or more.. i lost count.. but its running the origional drives and everything nice and strong. Seen plenty of drives die before their prime due to ESD and dust. and overal lack of use.
APK
1./ For long life, keep disks BELOW 45 degrees C
2./ The optimal temperature is 30-35 degrees C
3./ Cooling below 20 degrees C is not a good idea... although there is no clear understanding why
4./ The failure rate is increasing quickly with temperature... so that a drive at 60+C would be in big trouble
5./ There were not comments on temperature variability (which IMO is the biggest problem... a HDD going from cold to hot back to cold again and which might explain point 3 and is the typical desktop/workstation scenario compared to server... ie. switched on and off in regular cycles)
6./ Failure rate does not seem to increase per year according to the shown stats... y2 and y3 in fact are higher risk than y4 and y5
7./ If a HDD has survived to 5 years, there is no indication that failure rate is getting worse... (A surprise result to me)
8./ Failure rate 1y and less is much lower than 2-5, BUT this is probably due to NEW standards in drive manu, or NEW brand or model number being used compared to older systems, or NEW drives having big cache... and this is reducing wear and tear on heads/arms
***
Summary
1./ High temperatures cause exponentially increasing errors when temps > 45C. Keep you HDD temps down. Run SMART and check you are not going over these temps. If you are MOVE YOUR DRIVES so they are not sandwiched. Increase your HDD cooling, e.g. make sure your system temp is low... and airflow to HDD
2./ Newer drives are more realiable than older drives. BUT AGING does not seem to increase failure rate. (Assuming no knocks or other issues that accumulate over time)
NOTE THAT THE INFO here contradicts the OP's summary. IMO we have a bad summary at the start of this thread...
I do think that heat & usage DO contribute to a HDD's end of life, absolutely, because it's common-sense, but, what I THINK Google's trying to say is, they do as well, but NOT AS LARGELY AS YOU'D THINK, & IT'S MORE ABOUT THE QUALITY OF THE DRIVE ITSELF & ITS INTERNALS INITIALLY (in other words, how well it's made, from the get-go).
Good enough analysis for me... decent 'scientific method' & statistical sampling set, in the RIGHT type of conditions in heavily used diskdrives.
APK
P.S.=> Is this the "definitive work" on this? No, probably not possible, but VERY close imo... Admittedly, I did NOT look @ the .pdf they supplied, but upon scrutiny of the topic here & @ SLASHDOT earlier today, it is a pretty damn good step in the RIGHT direction... especially when you compare it to other items that are mechanical, like cars for instance, as others mentioned above... apk
ive mainly used second hand/ old drives that have been around the block a good few (million?) times lol and at most ive had the odd slow/noisy drive which isnt too bad unless your constantly moving files about or downloading.
I know HDD are supposed to survive up to 300G (as long as they are OFF at the time of the bump). But what about the cumulative effect of vibration while ON?
I've had TERRIBLE experience from second hand HDD... and never buy them now. All the second hand HDD arrived DOA or went dead within a couple of weeks/months. The issues of second hand HDD are: unable to see drop or vibration damage... is the person selling due to problems... and they do a reformat and then sell on.
Also... second hand systems (ex corporate desktops) tend to be handled like SH1t during pickup, store, and repackaging. Hence while the PC is fine... the HDD usually has a short life.
in my exp their results are pretty true, some drives are MADE TO TAKE THE HEAT, like 15k seagate scsi drives, they all run pretty warm to HOT AS HELL, but the fail rate is so low that i have only seem maby 5 of them die since the 15k drives came out, and all those died in the burn in period, seagate replaced them FAST(1 week turnaround or less) and they had a 10year warr on them as well when i asked if we needed to cool them better the tech i talked to said they where spicificly designed to run at those high temps and it wasnt anyting to worrie about, just not to touch them after they had been on and under heavy use for a while(already larned that, blistered fingers suck)
also got an old uw scsi hdd here tahts quantium, 9.4gb, 7500rpm(one of the first 7k rpm drives its about 2x as thick as todays drives) that sucker has warrrning allover it, its YEARS old, (like late 90's maby 2001 at latest) and its still running FLAWLESS in my old p3 box, and it dosnt got any extra cooling on it other then case airflow(not that good its an old pivilion hp case)
in my exp this artical is VERY true, if a drives going to fail its normaly in the first 6mo-1year depending on level of use(burn in use time) if it lasts that year and dosnt get droped/outside damnage its going to last 5+years.
now wd do have some querks in my exp, mostly with VIA chipset ide controlers, if they stop being recognized on the controler you will never get them to be recognized on that board/chipset again, but they will work on nforce,sis,intel,ali/uli,serverworks chipsets just fine, i have only seen this once with sata WD drives and it may have been a flook the drive worked on an addin card or another chipset based board but wouldnt even detect on via chipset based controlers, i just swaped it with a comp i was setting up gave the person the hitchi i was using in the new build, the wd was less then a month old, no harm since the warr for the system was on the shop for a year eather way, and the drive had a 3 year warr(owner only buys drives with at least 3year warr)
Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?
www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html
:)
================================
KEY POINTS SUMMARY:
================================
Infant mortality? - Interestingly, we observe little difference in replacement rates between SCSI, FC and SATA drives, potentially an indication that disk-independent factors, such as operating conditions, affect replacement rates more than component specific factors.”
-----------------------------------------
Vendor MTBF reliability?. . failure rate is not constant with age, and that, rather than a significant infant mortality effect, we see a significant early onset of wear-out degradation.
-----------------------------------------
Vendor MTBF reliability? While the datasheet AFRs are between 0.58% and 0.88%, the observed ARRs range from 0.5% to as high as 13.5%. That is, the observed ARRs by dataset and type, are by up to a factor of 15 higher than datasheet AFRs. Most commonly, the observed ARR values are in the 3%rang
------------------------------------------
Actual MTBFs? The weighted average ARR was 3.4 times larger than 0.88%, corresponding to a datasheet MTTF of 1,000,000 hours.
------------------------------------------
Drive reliability after burn-in? Contrary to common and proposed models, hard drive replacement rates do not enter steady state after the first year of operation. Instead replacement rates seem to steadily increase over time.
------------------------------------------
Data safety under RAID 5?. . . a key application of the exponential assumption is in estimating the time until data loss in a RAID system. This time depends on the probability of a second disk failure during reconstruction, a process which typically lasts on the order of a few hours. The . . . exponential distribution greatly underestimates the probability of a second failure . . . . the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data . . . .
-------------------------------------------
Independence of drive failures in an array? The distribution of time between disk replacements exhibits decreasing hazard rates, that is, the expected remaining time until the next disk was replaced grows with the time it has been since the last disk replacement.
-------------------------------------------
:)
* You guys may wish to check that out... I found it @ SLASHDOT today, & it continues to expand on this topic, albeit from another set of researchers findings.
APK
P.S.=> This is what I meant above in 1 of my posts here about GOOGLE's findings being "the definitive work" on this topic... although well done & from a heck of a sampleset (their doubtless CONSTANTLY pounded on disks for their servers in their search engine), this one seems to have been rated HIGHER as a good analysis of this @ SLASHDOT & per this quote:
hardware.slashdot.org/hardware/07/02/21/004233.shtml
---------------------------------------
"Google's wasn't the best storage paper at FAST '07. Another, more provocative paper looking at real-world results from 100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab"
---------------------------------------
Are SLASHDOT readers "the final word"? No, nobody really is, & no one knows it all in this field or is the "God of Computing" etc. but, they are another point of reference for you all into this topic!
(As far as slashdotters go? Well, imo, many are TOO "Pro-Linux/Anti-Microsoft", but there are guys there that REALLY know their stuff as well... take a read, & enjoy if this topic's "YOU"... )
That said, if the actual paper's "TOO MUCH" & it does get that way @ times...? You can always skim thru what the readers @ slashdot stated... they offer much of what it says, in more "human language terms/laymen's terms", & often 'from the trenches'... apk
BrandMostReliable:
faq.storagereview.com/tiki-index.php?page=BrandMostReliable
:)
* They seem to tend to also contradict GOOGLE's findings as well as the other paper, lol!
(In that, yes, heat/wear & tear do matter as well as use patterns & such, but rather than handling during shipping & also conditions of use as far as shock (CompletelyBonkers notes this above) matter more, which is what I drew from THIS article @ least vs. that from GOOGLE & USENIX.ORG's article on the subject @ hand here!)
APK
P.S.=> Personally, I think it's a GOOD combination of ALL of each of the 3 article's points: How the disk is handled during shipping + manufacture & yes, conditions it's used in, as far as 'shock/bonk factor', also heat & use patterns as well during usage, & lastly, how well it's made from the 'get go' in terms of parts quality & engineering used... apk