this post was submitted on 30 Mar 2024
90 points (96.9% liked)

Selfhosted

39905 readers
318 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
90
How should I do backups? (sh.itjust.works)
submitted 7 months ago* (last edited 7 months ago) by [email protected] to c/[email protected]
 

I have a server running Debian with 24 TB of storage. I would ideally like to back up all of it, though much of it is torrents, so only the ones with low seeders really need backed up. I know about the 321 rule but it sounds like it would be expensive. What do you do for backups? Also if anyone uses tape drives for backups I am kinda curious about that potentially for offsite backups in a safe deposit box or something.

TLDR: title.

Edit: You have mentioned borg and rsync, and while borg looks good, I want to go with rsync as it seems to be more actively maintained. I would like to also have my backups encrypted, but rsync doesn't seem to have that built in. Does anyone know what to do for encrypted backups?

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 7 months ago
[–] [email protected] 3 points 7 months ago (1 children)

to your edit: rsync is a tool to copy/move files, borg is a backup utility. there are scripts that use rsync to create proper backups, but if you want to go by 'more actively maintained' you should look into how these scripts are maintained, not rsync itself.
on the other hand - borg is actively maintained, there even are releases in the last two days, one stable and one beta. it also fulfills your 'encrypted backup' requirement and has a versioned backups built in.
tl;dr comparing borg backup and rsync is comparing apples and oranges

[–] [email protected] 1 points 7 months ago

Your right, I'm sold.

[–] [email protected] 2 points 7 months ago

Important stuff (about 150G) is synced to all my machines and a b2 Backblaze bucket.

I have a rented seed box for those low seeder torrents.

The stuff I can download again is only on a mirrored lvm pool with an lvmcache. I don't have any redundancy for my monerod data which is on an nvme.

I'm moving towards an immutable OS with 30 days of snapshots. While not the main reason, it does push one to practicing better sync habits.

[–] [email protected] 3 points 7 months ago* (last edited 7 months ago)

My use case is basically the same as yours.

I do restic to Wasabi.

I've been on restic for a few years now and have never had an issue. I started out using Google Drive for the backend but that was though my college which went away eventually so I swapped over to Wasabi but I'm considering B2.

It's actively maintained and encrypted.

There are a handful of backends it supports but can be extended by writing to an rclone backend.

[–] [email protected] 2 points 7 months ago (1 children)

I use rclone, which is essentially rsync for cloud services. It supports encrypion out of the box.

[–] [email protected] 2 points 7 months ago

I like the versatility of rclone.

It can copy to a cloud service directly.

I can chain an encryption process to that, so it encrypts then backs up.

I can then mount the encrypted, remote files so that I can easily get to them locally easily (e.g. I could run diff or md5 on select files as naturally as if they were local).

And it supports the rsync --backup options so that it can move locally deleted files elsewhere on the backup instead of deleting them there. I can set up a dir structure such as Oldfiles/20240301 Oldfiles/20240308 Etc that preserve deletions.

[–] [email protected] 5 points 7 months ago (1 children)

I have a machine at my parents’ house that has a single 20TB drive in it. I’ll log in once in a while and initiate an rsync to bring that up to current with my RAID at home. The specific reason I do it manually is in case there’s a ransomware attack. I won’t copy bad data. That’s also the reason I start it from the backup machine. The main machine doesn’t connect, the backup machine does, so ransomware wouldn’t cross that virtual boundary.

[–] [email protected] 1 points 7 months ago (1 children)

I would like to replicate your setup in the future. How do you connect between the two machines, using tailscale or something like that?

[–] [email protected] 1 points 7 months ago

It’s just over ssh. They’ve both got their own subdomains.

[–] [email protected] 2 points 7 months ago (2 children)

Anything I can download again doesn't get backup, but it sits on a RAID-1. I am ok at losing it due to carelessness but not due to a broken disk. I try to be carefully when messing with it and that's enough, I can always download again.

Anything like photos notes personal files and such gets backedup via restic to a disk mounted to the other side of the house. Offsite backup i am thinking about it, but not really got to it yet. Been lucky all this time.

From 10tb of stuff, the totality of my backupped stuff amount to 700gb. Since 90% of are photos, the backup size is about 700gb too. The actually part of that 700gb that changes (text files, documents..) amount to negligible. The photos never change, at most grow a bit over time.

[–] [email protected] 1 points 7 months ago

For offsite I backup to aws Glacier. Cheap to store expensive to retrieve. When the house burns down I'll still have the photos somewhere and at that point the cost is negligible compared to losing them since it really is worst case scenario.

[–] [email protected] 1 points 7 months ago

For offsite I backup to aws Glacier. Cheap to store expensive to retrieve. When the house burns down I'll still have the photos somewhere and at that point the cost is negligible compared to losing them since it really is worst case scenario.

[–] [email protected] 7 points 7 months ago

Short answer: figure out how much of that is actually irreplaceable and then find a friend or friends who'd be willing to set aside some of their storage space for your backups in exchange for you doing the same.

Tailscale makes the networking logistics incredibly simple and then you can do the actual backups however you see fit.

[–] [email protected] 1 points 7 months ago

As of today I'm actually in a lucky position where I am now able to set up a secondary NAS at my brother in laws and use that as a backup server that I can back up to essentially in real time.

All it'll cost me is the hardware and the electricity.

[–] [email protected] 5 points 7 months ago (5 children)

I am simple man s I use rsync.

Setup a mergerfs drive pool of about 60 TiB and rsync weekly.

Rsync seems daunting at first but then you realize how powerful and most importantly reliable it is.

It's important that you try to restore your backups from time to time.

One of the main reasons why I avoid softwares such as Kopia or Borg or Restic or whatever is in fashion:

  • they go unmantained
  • they are not simple: so many of my frienda struggled restoring backups because you are not dealing with files anymore, but encrypted or compressed blobs
  • rsync has an easy mental model and has extremely good defaults
[–] [email protected] 1 points 7 months ago (1 children)

One of the main reasons why I avoid softwares such as Kopia or Borg or Restic or whatever is in fashion:

  • they go unmantained
  • they are not simple: so many of my frienda struggled restoring backups because you are not dealing with files anymore, but encrypted or compressed blobs
  • rsync has an easy mental model and has extremely good defaults

Going unmaintained is a non issue, since you can still restore from your backup. It is not like a subscription or proprietary software which is no longer usable when you stop to pay for it or the company owning goes down.

The design of restic is quite simple and easy to understand. The original dev gave multiple talks about it, quite interesting.

Imho the additional features of dedup, encryption and versioning outweigh the points you mentioned by far.

[–] [email protected] 0 points 7 months ago (1 children)

Going unmaintained is a non issue, since you can still restore from your backup. It is not like a subscription or proprietary software which is no longer usable when you stop to pay for it or the company owning goes down.

Until they hit a hard bug or don't support newer transport formats or scenarios. Also the community dries up eventually

[–] [email protected] 0 points 7 months ago (1 children)

Until they hit a hard bug or don't support newer transport formats or scenarios. Also the community dries up eventually

That is why you test your backuo. It is unrealiatic, that in a stable software release there is suddenly, after you tested your backup a hard bug which prevents recovery.

Yes unmaintained software will not support new featueres.

I think you misunderstood me. You should not use unmaintained software as your backup tool, but IMO it is no problem when it suddenly goes unmaintained, your backup will most likely still work. Same with any other software, that goes unmaintained, look for an alternative.

[–] [email protected] 1 points 7 months ago (1 children)

It is unrealiatic, that in a stable software release there is suddenly, after you tested your backup a hard bug which prevents recovery.

How is unrealistic? Think of this:

  • day 1: you backup your files, test the backup and everything is fine
  • day 2: you store a new file that triggers a bug in the compression/encryption algorithm of whatever software you use, now backups are corrupted at least for this file Unless you test every backup you do, and consequently can't backup fast enough, I don't see how you can predict that future files and situations won't trigger bugs in a software
[–] [email protected] 1 points 7 months ago

We talk about software that is considered stable. That has verification checks for the backup. Used by thousands of ppl. It is unrealistic.

[–] [email protected] 1 points 7 months ago* (last edited 7 months ago) (2 children)

I was heavily considering borg but I just looked up rsync and it looks like everything I need. Thank you.

Edit: Actually encryption would also be nice. Is there any way to do that with rsync?

[–] [email protected] 1 points 7 months ago

what other people are saying, is that you rsync over an encrypted file system or other type of storages. What are your backup targets? in my case I own the disks so I use LUKS partition -> ext4 -> mergerfs to end up with a single volume I can mount on a folder

[–] [email protected] 1 points 7 months ago

Yes. You compose a crypted vault over your storage vault. I pay about $1/mo for B2 Backblaze. Around 150G last I checked.

[–] [email protected] 5 points 7 months ago (2 children)

As long as you understand that simply syncing files does not protect against accidental or malicious data loss like incremental backups do.

I also hope you're not using --delete because I've heard plenty of horror stories about the source dir becoming unmounted and rsync happily erasing everything on the target.

I used to use rsync for years, thinking just like you, that having plain old files beats having them in fancy obscure formats. I'm switching to Borg nowadays btw, but that's my choice, you gotta make yours.

rsync can work incrementally, it just takes a bit more fiddling. Here's what I did. First of all, no automatic --delete. I did run it every once in a while but only manually. The sync setup was:

  • Nightly sync source into nightly dir.
  • Weekly sync nightly dir into weekly dir.
  • Monthly tarball the weekly dir into monthly dir.

It's not bad but limited in certain ways, and of course you need lots of space for backups — or you have to pick and choose what you backup.

Borg can't really get around the space for backups requirement, but it's always incremental and between compression and deduplication can save you a ton of space.

Borg also has built-in backup checking and recovery parity which rsync doesn't, you'd have to figure out your own manual solution like par2 checksums (and those take up space too).

[–] [email protected] 1 points 7 months ago (1 children)

As long as you understand that simply syncing files does not protect against accidental or malicious data loss like incremental backups do.

Can you show me a scenario? I don't understand how incremental backups cover malicious data loss cases

[–] [email protected] 1 points 7 months ago

Let's say you're syncing your personal files into another location once a day.

On Monday you delete files. On Tuesday you edit a file. On Wednesday you maybe get some malware that (unknown to you) encrypts some files (or all of them).

A week later you realize that things went wrong and you want the deleted files back, or the old versions of the file you edited, and of course you'd want back the files that the ransomware has encrypted.

If you simply sync files you have no way to get back deleted files. Every day it synced whatever was in there, overwriting what was there before. If you also sync deletions then sync deletes the files. If you don't sync deletions then files keep piling up when you delete them or you move them around.

An incremental backup system like borg looks at small file chunks, not at files. Whenever a file changes, it makes a copy of only the chunks in it that changed. That way it can give you the latest version of the file but also all the versions before, and it doesn't store the same file over and over, only the chunks that really changed, and only one of each chunk. If you move a file to another folder it still has the same chunks so borg stores that it moved but it doesn't store the chunks twice. Also if several files have identical chunks, those chunks are only stored once each. And of course it never deletes files unless you explicitly tell it to.

Borg can give you perfect recall of all past versions of every file, and can do it in a way that saves tremendous amounts of space (between avoiding the duplication of chunks and compression).

[–] [email protected] 2 points 7 months ago (1 children)

Re needing lots of space: you can use --link-dest to make a new directory with hard links to unchanged files in a previous backup. So you end up with de-duplicated incremental backups. But borg handles all that transparently, with rsync you need to carefully plan relative target directory paths to get it to work correctly.

[–] [email protected] 2 points 7 months ago

Yeah Borg will see the duplicate chunks even if you move files around.

[–] [email protected] 1 points 7 months ago* (last edited 7 months ago) (2 children)

Two questions, and please don't take it as criticism, I am just curious about rsync but also one point you make.

"They go unmaintained" seeing as Borg is in use for quite some time, how does this look safer for rsync? For me it looks like the risk for that is similar, but I might not know background of development for these.

Second question more something I am asking myself, a lot of people seem to use rsync for backing up, but it is not incremental backup, or is it? I saw some mention of a "time machine" like implementation of rsync, but then we are again at your argument it might go unmaintained as its a separate niche implementation, or does that main rsync support incremental backup? If not, are you not missing that, how do you deal with it when just a file changes? New copy of it being transferred or somehow else?

[–] [email protected] 2 points 7 months ago (1 children)

how does this look safer for rsync? For me it looks like the risk for that is similar, but I might not know background of development for these.

Rsync is available out of the box in most linux distro and is used widely not only for backups, but a lot of other things, such as repository updates and transfers from file hosts. This means a lot more people are interested in it. Also the implementation, looking at the source code, is cleaner and easier to understand.

how do you deal with it when just a file changes?

I think you should consider that not all files are equal. Rsync for me is great because I end up with a bunch of disks that contain an exact copy of the files I have on my own server. Those files don't change frequently, they are movies, pictures, songs and so on.

Other files such as code, configuration, files on my smartphone, etc... are backup up differently. I use git for most stuff that fits its model, syncthing for my temporary folders and my mobile phone.

Not every file can suit the same backup model. I trust that files that get corrupted or lost are in my weekly rsync backup. A configuration file I messed up two minutes ago is on git.

[–] [email protected] 2 points 7 months ago

Thanks for elaborating, the part about the pictures and movies not changing makes a lot of sense actually. Thanks for sharing!

[–] [email protected] 1 points 7 months ago

One method depends on your storage provider. Rsync may have incremental snapshots, but I haven't looked because my storage provider has it.

Sometimes a separate tool like rsnapshot (but probably not rsnapshot itself as I dont think its hard links interact well with rsync) might be used to manage snapshots locally that are then rsynced.

On to storage providers or back ends. I use B2 Backblaze configured to never delete. When a file changes it uploads the new version and renames the old version with a timestamp and hides it. Rsync has tools to recover the old file versions or delete any history. Again, it only uploads the changed files so its not full snapshots.

[–] [email protected] 3 points 7 months ago

FWIW restic repository format already has two independent implementations. Restic (in Go) and Rustic (Rust), so the chances of both going unmaintained is hopefully pretty low.

[–] [email protected] 1 points 7 months ago* (last edited 7 months ago)

I have a storage VPS with HostHatch - 10TB for $10/month. That pricing was from a Black Friday sale a few years ago. They may not offer it that cheap again, but it's worth keeping an eye out for their sales. They had something similar last year but double the price, which is still a good deal.

I use Borgbackup to back up the data to the HostHatch VPS. The most important data has a second copy stored with pcloud - I've got a lifetime 2TB storage plan with them. I know lifetime accounts are kinda sketchy which is why it's just a secondary backup and not the primary one.

I don't have any "disposable" files like torrents though. All the stuff I back up are things like servers that run my websites and email, family photos, CDs I've ripped myself, etc. I've only got a few TB total.

[–] [email protected] 1 points 7 months ago

What I use is Borg. I use Borg to backup the server to a local NAS. Then I have a NAS at my grand parents house which I use to store the backups of the NAS it self.

load more comments
view more: next ›