Backup media, strategies and pitfalls

unwind-protect · Dec 7, 2024

W1zzard said:
Always use incremental backups

That doesn't help against corruption from memory errors and the like if the drive is connected locally. A memory error can destroy data structures of the older snapshots as well as current data. It also doesn't help against ransomware because the local computer is allowed to destroy snapshots.

That is where a separate machine comes in. Now the workstation can corrupt all it likes, it won't nuke older snapshots.

Calenhad said:
Is it vendor lock-in with fully open sourced software though?

The server side is not open source. The client is open source so that you can verify that your data is not leaving your premises unencrypted and to review the quality of that encryption (if you can). The person who made the service is the former security officer of FreeBSD, so he's likely ahead of most of us, but anybody can fatfinger something. That's why there's also a bug bounty.

Solaris17 · Dec 7, 2024

W1zzard said:
Check my software suggestions

I read it; I know of all of them.

TechBuyingHavoc · Dec 7, 2024

DirtyDingusMcgee said:
I wonder who is gonna care about my mp3 collection in 100 years?

1 hot backup, 1 cold, 1 offsite is plenty for all but the most extreme cases.......missle launch codes or kfc chicken recipe.

Imagine if everyone thought like this throughout the millennia. We would have NO historical records to speak of. :wtf:

W1zzard · Dec 7, 2024

unwind-protect said:
It also doesn't help against ransomware because the local computer is allowed to destroy snapshots.

These cloud storages can be configured so that "delete" isn't actually delete and data is stored forever, or with a timeout. Just make sure to use 2FA at the hosting company, so they can't harm you through that attack vector

unwind-protect said:
corruption

I'd assume that all cloud providers have sufficient redundancy to avoid that, but you can also tell any reasonably mature software to verify after upload and scrub to verify all data in regular intervals

unwind-protect said:
separate machine

Same building? Fire, flood, theft, vandalism, disgruntled employee, natural disaster

local has very good restore speeds though

dragontamer5788 · Dec 7, 2024

NAS Nas Nas Nas Nas.

Networked Attached Storage is a computer build for hard drives. You should use software like XigmaNAS (free download, Open Source BSD) which uses ZFS filesystem.

One of the most important software benefits is scrubbing. Scrubbing rereads all your data and corrects all errors that can be corrected.

Another important benefit is error correction. All data in a NAS should be configured to keep error correction on different hard drives, so a total loss of a HDD can still be recovered.

For example, RaidZFS2 setup with 6 hard drives keeps 4 hard drives of data and 2 hard drives of error correction. This means you can lose 2 hard drives but all the data stays safe.

Furthermore, with regular scrubs you will fix partial errors (like lost sectors, when just a piece of a hard drive loses data but otherwise stays functional). Rereading the data and fixing the data regularly is the key to a NAS and storage solutions.

Always buy more hard drives. Redundancy is how you gain safety. IMO aim at 4 physical hard drives at a minimum.

--------

You can use ZFS snapshots to help protect vs situations like Ransomware. If all your data gets ransomed, you can just roll back to your last snapshot (!!!!). Of course, snapshots take up space.

The only downside is that ZFS takes study and practice to learn how to use. It's implemented in Linux or FreeBSD (across NAS software like UnRAID, XigmaNAS, FreeNAS and many competitors).

I'm pretty sure NAS is the ultimate storage solution for everyone today and highly recommend enthusiasts to build one.

Waldorf · Dec 7, 2024

not really viable cost wise for most end users, that are just browsing/gaming etc.

qxp · Dec 7, 2024

Dr. Dro said:
For long-term storage, I purchase cheap HDDs, and keep them stored in my drawer alongside some dry packs and silica gel to keep the humidity out. None of my data is particularly sensitive, though some of the things I've gathered over the years have become quite difficult to find on the internet. On the cloud, I've had a OneDrive with 1TB of storage thanks to MS365, but no plans to renew that.

I am not entirely sure about having 0% humidity. I don't think there is much to rust in HDD - the case is aluminum, while having normal humidity (30-50%) might prolong life of electrolytic capacitors - if controller has any. Also normal humidity prevents charge build up, but this might not be a big issue in a drawer. Also, plastic parts, such as connector might become brittle in low humidity.

Dr. Dro · Dec 7, 2024

qxp said:
I am not entirely sure about having 0% humidity. I don't think there is much to rust in HDD - the case is aluminum, while having normal humidity (30-50%) might prolong life of electrolytic capacitors - if controller has any. Also normal humidity prevents charge build up, but this might not be a big issue in a drawer. Also, plastic parts, such as connector might become brittle in low humidity.

Ah it's just some anti-mold packs, replace them once they're filled up. Nothing too controlled or that makes that much of a difference, really.

frenchfry · Dec 26, 2024

I always wondered if ZFS is necessary to protect against data corruption.

qxp · Dec 26, 2024

frenchfry said:
I always wondered if ZFS is necessary to protect against data corruption.

ZFS has many nice features and can scale to large volumes. On the cluster I use the sysadmins swear by it. However, I don't think it is necessary, in fact, personally I prefer ext4, especially for backups. This is because if there is corruption it is easier for me to understand ext4 layout and try to fix things manually than ZFS.

ext4 has journals as well, and in many years of using it never saw any problems, cold restarts included. ext4 journals work. So I never really had to apply my knowledge of ext4 except in one case where a laptop was dropped and the spinning disk had errors. I got most of the files out except a few, probably where the scratches were. If you ever in this situation, use ddrescue to copy the image to a good drive, make another copy of the image and use fsck to try to recover. If fsck asks a few questions you can read up on ext4, or just guess and then try again with a new copy of the image.

dragontamer5788 · Dec 26, 2024

qxp said:
ZFS has many nice features and can scale to large volumes. On the cluster I use the sysadmins swear by it. However, I don't think it is necessary, in fact, personally I prefer ext4, especially for backups. This is because if there is corruption it is easier for me to understand ext4 layout and try to fix things manually than ZFS.

ext4 has journals as well, and in many years of using it never saw any problems, cold restarts included. ext4 journals work. So I never really had to apply my knowledge of ext4 except in one case where a laptop was dropped and the spinning disk had errors. I got most of the files out except a few, probably where the scratches were. If you ever in this situation, use ddrescue to copy the image to a good drive, make another copy of the image and use fsck to try to recover. If fsck asks a few questions you can read up on ext4, or just guess and then try again with a new copy of the image.

Ext4 has no protections against bitrot, and requires complex LVMs (Linux Volume Managers) to reach parity drives, striping, or other forms of redundancy.

ZFS has ZFS2 parity (aka: 2 drives of backup parity, so a 6-hard drive dataset will only lose data if 3 drives fail), it has the 'Scrub' command to check for bitrot (run it at least a few times per year) and more.

It's not even clear how to check for bitrot with EXT4. I'm pretty sure your data just does it it ever gets bitrotted. But ZFS will take the extra parity backup bits and reconstruct the data on the next scrub.

That's the key btw. Scrubbing. You need to constantly check every few months that the data is still there and reads fine. That's the main game, otherwise bits disappear over time.

qxp · Dec 26, 2024

dragontamer5788 said:
Ext4 has no protections against bitrot, and requires complex LVMs (Linux Volume Managers) to reach parity drives, striping, or other forms of redundancy.

ZFS has ZFS2 parity (aka: 2 drives of backup parity, so a 6-hard drive dataset will only lose data if 3 drives fail), it has the 'Scrub' command to check for bitrot (run it at least a few times per year) and more.

It's not even clear how to check for bitrot with EXT4. I'm pretty sure your data just does it it ever gets bitrotted. But ZFS will take the extra parity backup bits and reconstruct the data on the next scrub.

That's the key btw. Scrubbing. You need to constantly check every few months that the data is still there and reads fine. That's the main game, otherwise bits disappear over time.

Here is how I solve these with ext4:

I use software or hardware RAID if I need it, and prefer to have my filesystem separate. For backup I use independent media, and I don't consider a filesystem snapshot a reliable long-term backup, because if the primary storage fails for some reason you lose everything.
Both SDDs and HDDs have error correction built-in, so you don't need to worry about old-fashioned bit-rot. You could have occasional errors where a page is corrupted or sent to the wrong address, in which case you can try to recover it with various tools.
When I backup I compute checksums of all the files so I can tell if anything went wrong or missing. The checksum usually md5sum which is fast and is tool independent of the filesystem.
Having a relatively simple filesystem like ext4 (in data layout), with superblock redundancy on todays massive drives and coupled with md5sums and rescue tools increases the chance that if anything goes wrong I will get my data back. In contrast, with zfs and scrubbing I would be concerned about unnoticed errors accumulated over time. Btw, md5sum are small enough that you can save them on any kind of media, include cdr. Independent media - higher reliability.
I actually view modern SDDs as a more sophisticated RAID of the flash memory that they use, and I don't think they need to be RAIDed unless you need more space.

dragontamer5788 · Dec 26, 2024

qxp said:
Here is how I solve these with ext4:

I use software or hardware RAID if I need it, and prefer to have my filesystem separate. For backup I use independent media, and I don't consider a filesystem snapshot a reliable long-term backup, because if the primary storage fails for some reason you lose everything.

Both SDDs and HDDs have error correction built-in, so you don't need to worry about old-fashioned bit-rot. You could have occasional errors where a page is corrupted or sent to the wrong address, in which case you can try to recover it with various tools.

When I backup I compute checksums of all the files so I can tell if anything went wrong or missing. The checksum usually md5sum which is fast and is tool independent of the filesystem.

Having a relatively simple filesystem like ext4 (in data layout), with superblock redundancy on todays massive drives and coupled with md5sums and rescue tools increases the chance that if anything goes wrong I will get my data back. In contrast, with zfs and scrubbing I would be concerned about unnoticed errors accumulated over time. Btw, md5sum are small enough that you can save them on any kind of media, include cdr. Independent media - higher reliability.

I actually view modern SDDs as a more sophisticated RAID of the flash memory that they use, and I don't think they need to be RAIDed unless you need more space.

So you don't have a solution to bitrot and instead rely upon the built in error correction of flash.

Which has been proven to go bad if turned off for ~year long periods.

Hint: you cannot beat physics. Flash NAND works by storing electrons behind transistor gates. And those electrons escape given enough time. The only solution is to turn on the device, check for bitrot and (if necessary) rebuild / rewrite the data every few months. As it turns out, ZFS solves this with the simple 'Scrub' command. Microsoft's ReFS solves this with an automatic timer (but still requires the admin to turn on th computer so that it can check on time and see if the timers have expired and if new scrubs are necessary).

Yes. Modern error correction codes are great. But they can't beat physics. The only thing that solves bitrot (be it a HDD bitrot or a NAND flash bitrot) is turning on the device, reading the data and checking+rewriting it. If this doesn't happen with enough regularity, even the error-correctiom bits bitrot away (aka: error correction only works if the bits in question STILL EXIST when you check for it).

-------

You've also confused error correction with error detection. Md5sum detects errors. It cannot fix them.

The built in error correction of Flash doesn't always read/rewrite btw. It's better to have additional error correction at higher levels (like the ZFS or ReFS filesystem level) that ensures that these error correction steps occur. And also the admin must keep these physics in mind and remember to turn on the damn computer regularly to check for bitrot. That's it.

------

And fuck softraid. I'm not learning another useless technology. SoftRAID falls apart the minute you switch cables or move drives around. Linux LVMs are highly annoying. Real Filesystems can be reconstructed even if I move my drives over to a new motherboard / new Sata-slots. Both SoftRAID and hardraids fail at this.

Microsofts Storage Spaces and ReFS tech is passable but I do prefer ZFS. All of these software packages are competing with each other, you really should use the easiest code (and the easiest code is either ZFS if you're in Linux/Unix land... Or ReFS if you're stuck in Windows). No other solutions come anywhere close to the ease of use and/or reliability.

_roman_ · Dec 26, 2024

qxp said:
I use software or hardware RAID if I need it,

I do not like RAID. Fake raids especially. I use lvm2.
Especially when you mention ext4.

dragontamer5788 said:
The only solution is to turn on the device, check for bitrot and (if necessary) rebuild / rewrite the data every few months.

I had 5 or 6 different SATA 120 or 128GB SSDs in use for my gentoo linux. one was in use and the others were older state of that gentoo installation. I swapped regularly the drive and duplicated the current drive. Some SATA SSDs were around 1.5 years old when I used them again for /. I nuked the partition table and set it up as root drive again.

--

I make my full system backups on usb / nvme bridge. filesystem in question -> gpt -> lvm2 -> luks -> ext4
my root is on gpt -> lvm2 -> luks -> btrfs with compression
trash files are on a hdd wit read errors with ext4 (software downloads which do not matter - tehy have a checksum - can be downloaded again)
a lot of temporary junk files are in tmpfs.

I do trash SSDs or HDDs after 2 years of usage. I consider these drives than as not suitable anymore for /
Some get sold. One M2 NVME is my backup drive, which is older as 2 years.

I see the risks in a corrupted file system driver also. I had issues with a kernel based file system driver this year. So it's better to have different file system in use. Backup should use a different file system.

dragontamer5788 said:
Linux LVMs are highly annoying.

I setup those regularly. I had to read some guides 10 years ago to understand the principles.

~900GB of LVM2

LVM2 gives me the chance to always make a new "cryptpo luks container" with 110% of my root system size. When I need more space I delete another older container which I think can be removed. As it works on small parts o those 900GB. Those containers can be made with different sizes.

LVM2 has decent featues for a long term linux user.

qxp · Dec 26, 2024

dragontamer5788 said:
So you don't have a solution to bitrot and instead rely upon the built in error correction of flash.

Which has been proven to go bad if turned off for ~year long periods.

Hint: you cannot beat physics. Flash NAND works by storing electrons behind transistor gates. And those electrons escape given enough time. The only solution is to turn on the device, check for bitrot and (if necessary) rebuild / rewrite the data every few months. As it turns out, ZFS solves this with the simple 'Scrub' command. Microsoft's ReFS solves this with an automatic timer (but still requires the admin to turn on th computer so that it can check on time and see if the timers have expired and if new scrubs are necessary).

Yes. Modern error correction codes are great. But they can't beat physics. The only thing that solves bitrot (be it a HDD bitrot or a NAND flash bitrot) is turning on the device, reading the data and checking+rewriting it. If this doesn't happen with enough regularity, even the error-correctiom bits bitrot away (aka: error correction only works if the bits in question STILL EXIST when you check for it).

I looked at this, and the problem is not that much of an issue. The ~year long periods are only for QLC flash which I don't use. It does not apply to M-disk that was designed for long-term storage. Btw, a pretty standard setup is to have HDD/SDD run periodic self-test using smartctl. A long self-test typically reads all the media, so here is your scrub.

dragontamer5788 said:
-------

You've also confused error correction with error detection. Md5sum detects errors. It cannot fix them.

I did not

Md5sum detects errors, if you found that some file is corrupted you fall back to the previous backup. You do the correction *manually* which is a good thing for rare events. Otherwise you might be relying on some autocorrection feature but it might not work for one reason or the other.

dragontamer5788 said:
The built in error correction of Flash doesn't always read/rewrite btw. It's better to have additional error correction at higher levels (like the ZFS or ReFS filesystem level) that ensures that these error correction steps occur. And also the admin must keep these physics in mind and remember to turn on the damn computer regularly to check for bitrot. That's it.

Look if you like ZFS, sure go use it ! Also my comments above were for my personal setup. For a new startup I would still use ext4 because of simplicity and not worry about a few errors - the risk of a couple of bits flipping is nothing compared to business issues that can derail the startup. If your startup grows then you can get a sysadmin and then it would depend on what they are comfortable with - there aren't that many people that know ZFS inside and out. What's more if you need a lot of compute in a startup you are probably getting it in the cloud and there storage is virtualized anyway.

dragontamer5788 said:
------

And fuck softraid. I'm not learning another useless technology. SoftRAID falls apart the minute you switch cables or move drives around. Linux LVMs are highly annoying. Real Filesystems can be reconstructed even if I move my drives over to a new motherboard / new Sata-slots. Both SoftRAID and hardraids fail at this.

Never had any problem with software raid - you assign an id and then it does not matter how the drives are connected. You might be remembering situation couple of decades ago when the addressing was by the port.

dragontamer5788 said:
Microsofts Storage Spaces and ReFS tech is passable but I do prefer ZFS. All of these software packages are competing with each other, you really should use the easiest code (and the easiest code is either ZFS if you're in Linux/Unix land... Or ReFS if you're stuck in Windows). No other solutions come anywhere close to the ease of use and/or reliability.

I am glad that you like ZFS - and it is another nudge for me to try it when I have some spare time to read the source code and understand disk layout.

Waldorf · Dec 27, 2024

@qxp
except personal experience is nothing more than allowing us to make educated guesses.
to be statistically relevant, you would have to compare more than ~2500 units with identical hw, to be able to tell if its working/or not, or if its any better (vs other ways of doing things).

ignoring that's not even taking into account that about 25% of defective nand drives are caused by failing controllers, where you will not be able to recover crap,
no matter what you try (short of replacing controller in clean room).

one reason i tell ppl to have 2 drives with identical data, so even if one stops working you still have another copy, better 3, with one in a different location,
one in a fireproof safe, as i copy data from one driver to another faster than any recovery will take (outside of things like mbr missing).

similar to soft raid.
during the time i worked in shops/did IT a handful of ppl did it for perf or cloning, not a single person of those is still using it.

qxp · Dec 27, 2024

Waldorf said:
@qxp
except personal experience is nothing more than allowing us to make educated guesses.

It's a starting point - it worked for me, you can also ask other people and then decide what to do. Its nice to start with something that worked for someone else and then improve on it.

Waldorf said:
to be statistically relevant, you would have to compare more than ~2500 units with identical hw, to be able to tell if its working/or not, or if its any better (vs other ways of doing things).

Even that probably would not do it, because the failures are very dependent on the model and manufacturer. I think a way to make a thread go on forever is to ask whether one should use Seagate or Western Digital.

Waldorf said:
ignoring that's not even taking into account that about 25% of defective nand drives are caused by failing controllers, where you will not be able to recover crap,
no matter what you try (short of replacing controller in clean room).

Actually, I think you can replace the controller on HDD without a clean room - it is just a board on the outside of the box that connects to internals via a feedthrough. There are probably youtube videos of people doing it, but I have never had to do it myself. I also don't know whether modern HDDs might have some configuration flash onboard that makes it drive specific. Hopefully not.

Waldorf said:
one reason i tell ppl to have 2 drives with identical data, so even if one stops working you still have another copy, better 3, with one in a different location,
one in a fireproof safe, as i copy data from one driver to another faster than any recovery will take (outside of things like mbr missing).

Indeed, but I would also recommend to diversify storage medium - some stored on HDD, some SDD, some in cloud, some M-disk.

Waldorf said:
similar to soft raid.
during the time i worked in shops/did IT a handful of ppl did it for perf or cloning, not a single person of those is still using it.

There is a big difference between doing it for yourself and in a company. In a company you would expect people who keep current with technology, have a bigger budget and you have shorter time horizon - a few years for the critical stuff you are doing now.

At home, you get something to work and do not see problems for several years, and you might not know that some devices have started to show high failure rate, or that some filesystem you used is not well supported anymore. And you might not remember what the settings were for the device or filesystem you configured a decade ago. So I would argue this is harder.

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid 5090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on Schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Wooting 80 HE White, Gateron Jades
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	Main
Processor	R7 5950x
Motherboard	MSI x570S Unify-X Max
Cooling	converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory	16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s)	GB 2080S WaterForce WB
Storage	six M.2 pcie gen 4
Display(s)	Sony 50X90J
Case	Tt Level 20 HT
Audio Device(s)	Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply	Corsair RMx 750w
Mouse	Logitech G903
Keyboard	GSKILL Ripjaws
VR HMD	NA
Software	win 10 pro x64
Benchmark Scores	TimeSpy score Fire Strike Ultra SuperPosition CB20

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	Main
Processor	R7 5950x
Motherboard	MSI x570S Unify-X Max
Cooling	converted Eisbär 280, two F14 + three F12S intake, two P14S + two P14 + two F14 as exhaust
Memory	16 GB Corsair LPX bdie @3600/16 1.35v
Video Card(s)	GB 2080S WaterForce WB
Storage	six M.2 pcie gen 4
Display(s)	Sony 50X90J
Case	Tt Level 20 HT
Audio Device(s)	Asus Xonar AE, modded Sennheiser HD 558, Klipsch 2.1 THX
Power Supply	Corsair RMx 750w
Mouse	Logitech G903
Keyboard	GSKILL Ripjaws
VR HMD	NA
Software	win 10 pro x64
Benchmark Scores	TimeSpy score Fire Strike Ultra SuperPosition CB20

Backup media, strategies and pitfalls

unwind-protect

Solaris17

Super Dainty Moderator

TechBuyingHavoc

W1zzard

Administrator

dragontamer5788

Waldorf

qxp

Dr. Dro

frenchfry

qxp

dragontamer5788

qxp

dragontamer5788

_roman_

qxp

Waldorf

qxp