this post was submitted on 17 Feb 2024
95 points (98.0% liked)

Selfhosted

40183 readers
1058 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

What storage software could I run to have an archive of my personal files (a couple TB of photos) that doesn't require I keep a full local copy of all the data? I like the idea of a simple and focused tool like Syncthing, but they seem to be angling towards replication.

Is the simple choice to run some S3-like backend and use CLI or other client to append and browse files? I'd love something with fault tolerance that someone can gradually add disks to. If ceph were either less complicated or used less resources I'd want to do that.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 9 months ago

Restic with Backrest: https://forum.restic.net/t/backrest-a-cross-platform-backup-orchestrator-and-webui-for-restic/7069

Although I use ResticProfile atm with RClone to sync to backblaze B2

[–] [email protected] 2 points 9 months ago

You could run a WebDAV server, like Nextcloud.

On windows it supports thin sync (meaning that it keep a reference to the file instead of the whole file), on Linux not yet, as it is still in alpha (but you can just connect it as a remote disk and be done with it. That's how I do with mines).

If you don't want the whole Nextcloud, there are standalone cli WebDAV servers.

[–] [email protected] 4 points 9 months ago* (last edited 9 months ago)

Sounds like something like "git annex" is what you're looking for?

I use this to manage all my photos. It lets you add binaries and synchronize then to a backend server (can be local, can be s3, back blaze, etc).

You can then "drop" files and it ensures a remote exists first. And when you drop the file your still see a symlink of it locally (it's broken) so that you know it exists.

My workflow is to add my files, sync them to both a local server and b2, then I drop and fetch folders as i need (need disk space? "git annex drop 2022*", want to edit some photos? "git annex get 2022_10_01".

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago)

What platform?

Another user said it - what your asking for isn't a backup, it's just data transfer.

It sounds like you're looking for a storage backend that hosts all your data and can download data to the client side on the fly.

If your use case is Windows, Nextcloud Desktop may be what you looking for. I have a similar setup with the game clips folder. It detects changes and auto uploads then, while deleting less recently used data that's properly server side. This feature might be in Mac but I haven't tested it.

Backup wise, I capture an rsync of the nextcloud database and filesystem server-side and store it on a different chassis. That then gets backed up again to a USB drive I can grab and run.

Nextcloud also supports external storage, which the server directly connects to: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html

[–] [email protected] 15 points 9 months ago (1 children)

that doesn't require I keep a full local copy of all the data

If you don't do that, the place that you call "backup" is the only place where it is stored - that is not a Backup. A backup is an additional place where it is stored, for the case when your primary storage gets destroyed.

[–] [email protected] 1 points 9 months ago (1 children)

"Local" as in the machine I am using to work on, which has a 256 GB SSD. Not as in "on-site" and "off-site."

[–] [email protected] 8 points 9 months ago* (last edited 9 months ago) (1 children)

In the IT world, we just call that a server. The usual golden rule for backups is 3-2-1:

  • 3 copies of the data total, of which
  • 2 are backups (not the primary access), and
  • 1 of the backups is off-site.

So, if the data is only server side, it's just data. If the data is only client side, it's just data. But if the data is fully replicated on both sides, now you have a backup.

There's a related adage regarding backups: "if there's two copies of the data, you effectively have one. If there's only one copy of the data, you can never guarantee it's there". Basically, it means you should always assume one copy somewhere will fail and you will be left with n-1 copies. In your example, if your server failed or got ransomwared, you wouldn't have a complete dataset since the local computer doesn't have a full replica.

I recently had a a backup drive fail on me, and all I had to do was just buy a new one. No data loss, I just regenerated the backup as soon as the drive was spun up. I've also had to restore entire servers that have failed. Minimal data loss since the last backup, but nothing I couldn't rebuild.

Edit: I'm not saying what your asking for is wrong or bad, I'm just saying "backup" isn't the right word to ask about. It'll muddy some of the answers as to what you're really looking for.

[–] [email protected] 2 points 9 months ago (1 children)

Yes, I do see that. I'm definitely getting answers to a question I didn't intend. I was hoping for more of an rsync but that something which also provides viewing and incremental backups to an offsite. I don't know how to phrase that, and perhaps for what I want it makes more sense to have rsync/rclone to copy files around and something else to view.

[–] [email protected] 1 points 9 months ago

It is not that easy to understand what you want, to me it reads like you want something like Nextcloud - i.e. your own little cloud, where you can put all your stuff, and view it through the webbrowser or the nextcloud apps, and also keep selected parts of your stuff in sync with your devices (or automatically upload photos take with your smartphone for example).

Backup of Nextcloud (or whatever you want to use) is a seperate topic. Any incremental backup tool would apply though, so there's much to choose from. I personally use btrbk which uses Btrfs Send+Receive to push incremental snapshots to an offsite server.

[–] [email protected] 3 points 9 months ago
[–] [email protected] 4 points 9 months ago (2 children)

All of my machines back up to my home server’s RAID over WebDAV with Nephele.

Then every few days I’ll manually sync them to a server at my parents’ house with a single huge HDD using rsync. I do this manually so that if anything happens to my home server (like ransomware) it doesn’t mirror destroyed data.

Since the Nephele share is just WebDAV, I can mount it locally and move things into it that I don’t want local anymore.

I created Nephele, and I just finished writing an encryption plugin. I wrote it because I’m also going to write an S3 adapter. That way, you can store things in S3, but they’ll be encrypted, so Amazon can’t see them.

[–] [email protected] 2 points 9 months ago

This is really cool. I ended up trying something similar: serving from a ZFS pool with SeaweedFS. TBD if that's going to work for me long term.

I would definitely be able to manually sync the SeaweedFS files with rsync to another location but from what I see it requires me to use their software to make sense of any structure. I might be able to mount it and sync that way, hopefully performance for that is not too bad.

Syncing like that and having more control over where the files are placed on the RAID is very cool.

[–] [email protected] 2 points 9 months ago (2 children)

Wouldn't syncing automatically every few days give you the same protection though?

[–] [email protected] 1 points 9 months ago

I’m assuming I would notice, because none of my services on the machine would work anymore.

[–] [email protected] 1 points 9 months ago

Protection against if it happens and they have not noticed within those few days. Probably especially important if they leave the system running while on vacation.

[–] [email protected] 3 points 9 months ago
[–] [email protected] 9 points 9 months ago (1 children)
[–] [email protected] 5 points 9 months ago

Rclone.org is poetry then ;)

[–] [email protected] 4 points 9 months ago

rsync, for sure. That's what I used when I had to migrate a 10TB datastore to a new machins.

[–] [email protected] 7 points 9 months ago

rsync and another hard drive

[–] [email protected] 3 points 9 months ago* (last edited 9 months ago)

Hetzner storage box, rsync and a bash script

[–] [email protected] 1 points 9 months ago (1 children)

Borg. With rsync.net if you want to keep an off-site.

[–] [email protected] 2 points 9 months ago (3 children)

Is there a decent UI for borg, or is it all CLI?

[–] [email protected] 1 points 9 months ago

On linux and Mac there's also https://vorta.borgbase.com/ which is pretty good

[–] [email protected] 1 points 9 months ago

There's https://borgwarehouse.com/ Haven't found the time to test it but looks interesting.

[–] [email protected] 2 points 9 months ago

Pika backup seems to be mentioned a lot.

[–] [email protected] 0 points 9 months ago* (last edited 9 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
Git Popular version control system, primarily for code
NAS Network-Attached Storage
RAID Redundant Array of Independent Disks for mass storage
SSD Solid State Drive mass storage
ZFS Solaris/Linux filesystem focusing on data integrity

5 acronyms in this thread; the most compressed thread commented on today has 2 acronyms.

[Thread #523 for this sub, first seen 17th Feb 2024, 23:55] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 23 points 9 months ago (1 children)

Punch cards. Is it the best no but no one is going to bother to steal my data. Encryption through inconvenience

[–] [email protected] 19 points 9 months ago

Do a riffle shuffle to make them even more secure!

[–] [email protected] 4 points 9 months ago* (last edited 9 months ago) (1 children)

So i understood you just want some local storage system with some fault tolerance.
ZFS will do that. Nothing fancy, just volumes as either blockdevice or ZFS filesystem.

If you want something more fancy, maybe even distributed, check out storage cluster systems with erasure coding, less storage wasted than with pure replication, though comes at reconstruction cost if something goes wrong.

MinIO comes to mind, tough i never used it.. my requirements seem to be so rare, these tools only get close :/
afaik you can add more disks and nodes more or less dynamically with it.

[–] [email protected] 1 points 9 months ago

Yeah it's hard to find something that perfectly fits just what you want. I think it's better if I do something simple like ZFS and maybe some kind of file server on top.

[–] [email protected] 6 points 9 months ago (2 children)

I use rclone, with encryption, to S3. I have close to 3TB of personal data backed up to S3 this way - photos, videos, paperless-ngx (files and database).

Only readable if you have the passwords configured on my singular backup host (a RasPi), or stored in Bitwarden.

[–] [email protected] 1 points 9 months ago* (last edited 9 months ago)

tarsnap makes use of S3. does a decent deduplication job as well

[–] [email protected] 2 points 9 months ago (1 children)

This alongside using Backblaze is what I would suggest assuming you are thinking online. Cheap and reliable, also relatively easy via a cron job. https://help.backblaze.com/hc/en-us/articles/1260804565710-Quickstart-Guide-for-Rclone-and-B2-Cloud-Storage

[–] [email protected] 3 points 9 months ago

Backblaze don't have a POP in my country, unfortunately.

[–] [email protected] 24 points 9 months ago

Borg Backup. It can work locally or over network. Takes snapshots of the files you give it. Performs deduplication, compression and optionally encryption. You can check the integrity of the backups and repair them. There's a very simple to use GUI for it called Pika Backup to get you started.

[–] [email protected] 4 points 9 months ago

Where will the target be? Online or local? Rsync is really easy to use and the target files are browse-able. I could be too dense but I find online buckets aren't easily browse-able. Even a homemade NAS might be a good choice and it's easily scalable.

load more comments
view more: next ›