this post was submitted on 15 Jun 2024

76 points (95.2% liked)

Selfhosted

40133 readers

579 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

[email protected]

Nephele WebDAV Server now supports deduplicated file storage. (lemmy.world)

submitted 5 months ago* (last edited 5 months ago) by [email protected] to c/[email protected]

27 comments fedilink hide all child comments

https://hub.docker.com/r/sciactive/nephele

In the latest version of Nephele, you can now create a WebDAV server that deduplicates files that you add to it.

I created this feature because every night at midnight, my Minecraft world that my friends and I play on gets backed up. Our world has grown to about 5 GB, but every night, the same files get backed up over and over. It's a waste of space to store the same files again and again, but I want the ability to roll back our world to any day in the past.

So with this new feature of Nephele, I can upload the Minecraft backup and only the files that have changed will take up additional space. It's like having infinite incremental backups that never need a full backup after the first time, and can be accessed instantly.

Nephele will only delete a file from the file storage once all copies that share the same file contents have been deleted, so unlike with most incremental backup solutions, you can delete previous backups easily and regain space.

Edit: So, I think my post is causing some confusion. I should make it clear that my use case is specific for me. This is a general purpose deduplicating file server. It will take any files you give it and deduplicate them in its storage. It's not a backup system, and it's not a versioning system. My use case is only one of many you can use a deduplicating file server for.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 5 months ago (2 children)

Cool, but you basically reinvented btrfs snapshots.

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago) (1 children)

No, please research what deduplication is before commenting.

https://en.wikipedia.org/wiki/Data_deduplication

You might be thinking of incremental backups which also saves space but is not the same thing.

If you for example ran deduplication on a file server and a bunch of users uploaded the same files in multiple different directories, deduplication would remove all duplicate copies and just link them together. This has nothing to do with snapshots. btrfs might support deduplication but now this software does too. Your comment was completely unnecessary since not everything in the world can or should run btrfs

[–] [email protected] 1 points 5 months ago (1 children)

I am aware of the difference, but if you read the OP they are using it mainly for something that could also be done with btrfs.

[–] [email protected] 1 points 5 months ago (1 children)

Sure but maybe they don't want to use btrfs. Ever thought about that?

[–] [email protected] 1 points 5 months ago

Their response further below clearly shows that they didn't bother researching btrfs or a similar filesystem that can do this.

Now of course they can do with their time what they want, but I am also free to point out that there are other ways (that are maybe more established) to reach the same goal.

[–] [email protected] 6 points 5 months ago* (last edited 5 months ago) (2 children)

Not at all. Btrfs snapshots:

aren't accessible unless you revert to them
only happen when you manually trigger them
don't deduplicate files in the file system, just across snapshots
are handled at the file-system level (meaning you'd have to create a separate file system, or at least a separate subvolume if you're already using btrfs, to make them with an exclusive set of files)
don't have access controls beyond Linux' basic file controls (so sharing a server will be complicated)
aren't served across the network (you can serve a btrfs file system, but then you can't access a previous snapshot)
aren't portable (you can't just copy a set of files to a new server, you have to image the partition)

They serve a very different purpose than a deduplicating file server. Now, there are other deduplicating file servers, but I don't know of any that are open source and run on Linux.

[–] [email protected] 4 points 5 months ago* (last edited 5 months ago) (1 children)

Actually you seem to have reinvented Syncthing's versioning feature... or this.

Still great work.

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago)

So, to be clear, this is not a versioning system. I'm just kind of using it for that with my Minecraft backups. This is a deduplicating file server. It takes the files you give it and deduplicates them. Then, later, you can pull them out again. I am using it for backups, but it is also not a backup system.

I think I made it seem in my post like what I’m using it for is what it should be used for, or the only thing it can be used for. My use case is just one of many that you can use a deduplicating file server for.

[–] [email protected] 12 points 5 months ago (1 children)

Uhm, I think you need to do better research as most of the above isn't true.

[–] [email protected] 5 points 5 months ago (2 children)

Can you tell me which is wrong?

[–] [email protected] 1 points 5 months ago (1 children)

Start with this to learn how snapshots work

https://fedoramagazine.org/working-with-btrfs-snapshots/

Then here the learn how to make automatic snapshots with retention

https://ounapuu.ee/posts/2022/04/05/btrfs-snapshots/

I do something very similar with zfs snapshots and deduplication on. I have one ever 5 mins and save 1 hr worth then save 24 hourlys every day and 1 day for a month etc

For backup to remote locations you can send a snapshot offsite

[–] [email protected] 4 points 5 months ago* (last edited 5 months ago) (1 children)

Having a separate tool do the work of making a snapshot doesn’t mean what I said is wrong. Snapshots are not automatic, with regard to btrfs. You can have a tool automatically make a snapshot, but btrfs won’t do it for you.

My overall point is that a deduplicating file server has very little in common with btrfs snapshots. The original commenter looked at my use case for my own deduplicating file server and assumed that the server was the same thing as my use case.

I think if they took the time to look at the server and see what it is actually doing, they would see that it is very different from btrfs.

[–] [email protected] 2 points 5 months ago* (last edited 5 months ago) (1 children)

I use zfs so not sure about others but I thought all cow file systems have deduplication already? Zfs has it turned on by default. Why make your own file deduplication system instead of just using a zfs filesystem and letting that do the work for you?

Snapshots are also extremely efficient on cow filesystems like zfs as they only store the diff between the previous state and the current one so taking a snapshot every 5 mins is not a big deal for my homelab.

I can easily explore any of the snapshots and pull any file from and of the snapshots.

I'm not trying to shit on your project, just trying to understand its usecase since it seems to me ZFS provides all the benefits already

[–] [email protected] 3 points 5 months ago (1 children)

Btrfs does not have its own built in deduplication like zfs does. I’m surprised zfs has it turned on by default, considering file system level deduplication is fairly CPU and RAM intensive. But yeah, if you can use a deduplicated file system, go for it.

In my use case, I’m not willing to move away from ext4 (on my home server, which is where this is running), and I don’t need all files on my file system to be deduplicated, just a set of files that I add to every day. I made this because it fits my use cases better than any other solution (this current use case, and some more I’m planning to implement in the future).

As far as using snapshots to implement my current use case, it’s not possible. My Minecraft server runs on a different system than where I put my backups, and I want it that way. They are meant to be backups, not versions, and backups shouldn’t be stored on the same system. That server has also been migrated several times since I first started running it in 2019. I have back ups that go that far back too. So I need a system that I can put years worth of existing backups into, not just start taking backups now.

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago) (1 children)

Thanks! Makes sense if you can't change file systems.

For what it's worth, zfs let's you dedup on a per dataset basis so you can easily choose to have some files deduped and not others. Same with compression.

For example, without building anything new the setup could have been to copy the data from the actual Minecraft server to the backup that has ZFS using rsync or some other tool. Then the back server just runs a snapshot every 5 mins or whatever. You now have a backup on another system that has snapshots with whatever frequency you want, with dedup.

Restoring an old backup just means you rsync from a snapshot back to the Minecraft server.

Rsync only needed if both servers don't have ZFS. If they both have ZFS, send and recieve commands are built into zfs are are designed for exactly this use case. You can easily send a snap shot to another server if they both have ZFS.

Zfs also has samba and NFS export built in if you want to share the filesystem to another server.

[–] [email protected] 2 points 5 months ago* (last edited 5 months ago)

Yeah, that could work if I could switch to zfs. I’m also using the built in backup feature on Crafty to do backups, and it just makes zip files in a directory. I like it because I can run commands inside the Minecraft server before the backup to tell anyone who’s on the server that a backup is happening, but I’m sure there’s a way to do that from a shell script too. It’s the need for putting in years worth of old backups that makes my use case need something very specific though.

In the future I’m planning on making this work with S3 as the blob storage rather than the file system, so that’s something else that would make this stand out compared to FS based deduplication strategies (but that’s not built yet, so I can’t say that’s a differentiating feature yet). My ultimate goal is to have all my Minecraft backups deduplicated and stored in something like Backblaze, so I’m not taking up any space on my home server.

[–] [email protected] 1 points 5 months ago (1 children)

Points 1,2,6,7 are wrong, and the others are partially wrong and/or can be easily solved with other existing tools.

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago) (1 children)

Can you explain to me then:

How do you access the files in a previous snapshot without reverting to it?
How does btrfs automatically make its own snapshots?
How does btrfs serve the contents of previous snapshots across the network?
How can I copy the contents of all previous snapshots at once without imaging the partition?

If you’re using other tools on top of btrfs to implement a deduplicating file server, then you can’t say I reinvented btrfs snapshots, can you?

I don’t know how much clearer I can make the distinction between a copy on write file system and a deduplicating file server. They are completely different things for completely different purposes. The only thing they have in common is that they will deduplicate data, but a COW FS only deduplicates data under certain conditions. My server will deduplicate every file across its entire file store.

I get that people on Lemmy love to shit on other people’s accomplishments. I’ve never posted anything on here without it being criticized, but saying I “reinvented btrfs snapshots” is quite possibly the worst, most inaccurate take anyone has ever had on any of my posts.

[–] [email protected] 2 points 5 months ago (1 children)

Snapshots are accessible in read only mode without reverting to it, snapshots can be easily configured to be taken automatically with a simple cron job, btrfs allows full control of snapshots over SSH, and you can easily copy a snapshot to another btrfs filesystem on the same or remote server.

Also btrfs follows the Unix philosophy, so of course you will be using additional tools with it, but btrbk for example makes all of the above really easy with no additional tools needed.

Obviously there are differences, but serving WebDAV on top of a btrfs filesystem is very similar to what you have made.

[–] [email protected] 4 points 5 months ago* (last edited 5 months ago)

It very much is not. Again, btrfs will only deduplicate data under certain circumstances, like if you copy a file to a new location. If I take a USB stick with an 8gb movie file on it and copy that to btrfs twice, it will take up 16gb on disk. If I copy it to btrfs once, then copy it from there to a new location, it will take up 8gb on disk. Btrfs does not deduplicate files, it deduplicates copies. I want something that deduplicates files.

If you run WebDAV on top of btrfs and try what I’m using it for, it literally will not deduplicate anything, because you’re always writing new files to it, not copying existing files.

Triggering a snapshot with a cron job doesn’t mean it’s automatic to btrfs. The action still happens only when triggered. Btrfs doesn’t take snapshots for you.

What good is management through SSH? I want a deduplicating file server, not a versioning file system I have to manage over SSH server. If I wanted versioning like that, I would just use git.

And again, adding tools on top of btrfs to recreate something similar to what I’ve made here does not mean I reinvented btrfs. Btrfs is a COW FS. I wrote a deduplicating file server. I honestly can’t believe you don’t see the difference here. Like, are you trolling?

I feel like you misinterpreted my post to mean that my use case is the only thing you could use my server for, and you’re just running with it, even though I’ve told you multiple times, I wrote a deduplicating file server, not an incremental backup system, and not a versioning system. The fact that I’m using it for incremental backups is inconsequential to what it actually does. It deduplicates files and serves them from WebDAV. AFAIK, there’s no other open source server that does that.