thinking about replacing my internal hard drives with an external RAID 1 array of 20 TB enterprise drives for some extra data retention in case of failure
Nope, don't do it. Get an
internal HDD, plug it into your most reliable 24/7 accessible server and let it go BRRRRRRRR.
Nearline/offline backups are important but the moment you start climbing into the xxTB of content, you're not using exclusively "important" non-replaceable data.
As data hoarders, we know better. I will shill Toshiba all decade long, love these drives...But as hobbyist or pro you've got a LOT of thinking to do before any migration.
I'm leaning towards the Toshiba, as that's the cheapest of the bunch, but I don't want to regret my decision later.
Get one, verify it CLEAN, performance test then consider how to maximize your experience.
I keep a 16TB in the eMachines. Yes there's a sata2 speed cap and yes they do get warm from extended writes.
Make sure you get a fan and don't pair these up with hot disks or any on the way out:
Incredible drives.
Performance isn't extremely important, but reliability is (two of such drives aren't cheap).
The performance isn't important until you need it.
You're likely going to see similar speeds so be prepared for it.
An HDD is not just a magic box to drop files. Let it thrash.
Okay, time to get a little autistic. It makes far more sense to strategize your data provisioning than going into this like it's some extension of your current storage.
A new disk isn't a tumor but a new lease on life. One of the reasons we're talking about HDDs in the first place is their long standing reputation in storage and how they behave.
The disks that currently hold your data have TIME on them, which translates to WEAR. We don't know how much but if you're the type to hold onto disks for a while, it adds up.
There are performance characteristics of new drives that obsolete your older ones. Not by a lot but just enough for you to consider them and then some.
Maybe you double up on storage every other year or wait it out to quadruple. Maybe Toshiba's
half gigabyte cache appeals to you too. You've got options and they're great.
If your storage strategy is good, you're going to shuffle a bit of data around.
Avoid extreme jumps like I did: 300GB...750GB...2TB...4.9TB...16TB.
Roughly 2.2x your current largest is often safe and cost effective.
If something happens and you need to suddenly recover a ton of data, the time and money involved is relative to these capacity jumps.
Write a chart.
A chart of your disks. Their sizes, speeds, features you care about, their behaviors and how you want to arrange them for use.
Try something like most frequent access (online) to less frequent to cold spare (nearline/offline).
If you're already setup good, this will change a little bit. If it's a really big deal, expect a very big change.
Write a chart of your data.
I'm not sure how to put this into human language but consider serious uses like game libraries, music, movies, various video backups, photos, memes, whatever.
You'll start to notice other things that stand out as you plot these details.
Some data takes up more space than others. Some data takes WAAAAY more space.
Some kinds of data fit way better on a volume arranged in 16K clusters than 64K. Some data is just slow no matter what. Be wary of the behaviors.
You may need to be aware of variations in volume types like NTFS and ReFS if you ever messed with that, they behave a little differently.
More importantly, be aware of the data you want to keep and the data that
can be dumped. I'll get into that in a bit.
Be aware of bad provisions where you're likely to experience a serious balloon in library sizes if you weren't able to do it before but want/need it.
My Steam library went from 544GB (561GB on disk) to 2619GB of a 3TB iSCSI chunk sitting at the start of the data partition of my 16TB Toshiba.
I had a lot of stuff to grab. I also know my own purchase habits and it will take ~2 years to fill another 500GB.
By then I will either chunk it out again or have another strategy for the volume.
I then made a 2.75TB iSCSI provision for Epic and various other game libraries that finally get a drive letter.
Having both volumes on one disk might be a bad idea if there's lots of updates from both stores at the same time so if you do something like this, be warned.
At this point, nearly half the disk is decided and good.
About dumping data...You have data that can be sorted into categories like constantly used, frequently accessed, rarely used and they can be in nets like LAN only or Internet distribution.
The importance rank is entirely up to you but you need to make a choice when determining the value of those different pieces of data and how they all fit together.
They're all types of data that behave differently from one another, with varying priorities, securities and growth patterns.
Most of us are just single users of each device but if you're some kind of NetAdmin, have quotas or just a really restrictive bandwidth policy, it's something to think about.
Figure out which data sets are:
nonreplaceable personal/isolate
nonreplaceable historic/archive
nonreplaceable work
replaceable non-burden
replaceable fair burden
replaceable extreme burden
A noteworthy example: A Steam Library can always be replaced, a Steam
Archive probably not.
It's also troublesome to replace and validate tons of files or lose too much space to cluster.
A million files with ~9GB lost to clusters.
It also matters way more to certain people. If you're just an ordinary gamer but fickle, you tend to game hop a lot. The biggest 8 titles in my Steam add up to 60% of my monthly bandwidth quota and that's without updates, which thankfully have calmed down lately. If you're some kind of retro gamer, of course you're going to have your own repo for playing and streaming but if you're a variety streamer that bounces from title to title every day or every hour, that's a lot of data to retain in your library to just be on call like that. It can all be replaced but it's a pretty heavy burden to do so.
Emulation kits, ISOs, YouTube archives, Gallery-DL archives, 3D Models, Virtual Machine stores, Unreal and Unity backups, system driver packages, keys, functional IIS website backups, language material archives, other study kits, patches for various kinds of games/software, phone backups, photo archives, photostock, entire user profile backups, various open toolkits, torrent backups, archives of antique websites that no longer exist and can't be wayback'd, music storage pools and their archives...
All of these fit somewhere in the chart and I'm sure you're going to figure out more than I can put here.
If you really want to keep a RAID for everything, I'm sure it's fine.
I avoid that setup until using M.2 but for HDDs, manual mirroring has worked fine.