this post was submitted on 15 Sep 2024
83 points (97.7% liked)

Linux

48652 readers
1025 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Hi,

I’m not sure if this is the right community for my question, but as my daily driver is Linux, it feels somewhat relevant.

I have a lot of data on my backup drives, and recently added 50GB to my already 300GB of storage (I can already hear the comments about how low/high/boring that is). It's mostly family pictures, videos, and documents since 2004, much of which has already been compressed using self-made bash scripts (so it’s Linux-related ^^).

I have a lot of data that I don’t need regular access to and won’t be changing anymore. I'm looking for a way to archive it securely, separate from my backup but still safe.

My initial thought was to burn it onto DVDs, but that's quite outdated and DVDs don't hold much data. Blu-ray discs can store more, but I'm unsure about their longevity. Is there a better option? I'm looking for something immutable, safe, easy to use, and that will stand the test of time.

I read about data crystals, but they seem to be still in the research phase and not available for consumers. What about using old hard drives? Don’t they need to be powered on every few months/years to maintain the magnetic charges?

What do you think? How do you archive data that won’t change and doesn’t need to be very accessible?

Cheers

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 3 months ago* (last edited 3 months ago)

I am using https://duplicati.com/ and https://www.backblaze.com/ ( use their b2 cloud storage its variable and 6$ a month for 1TB or less depending on how much you use) run a schedule beckup every night for my photos. It's compressed and encrypted. I save a config file to my google so say if my house and server burn down. I just pull my config from google then redownload duplicati and boom pull my back up down. The whole set up backs up incremental so once you do the first back up its only changes that are uploaded. I love the whole set up.

Edit: You can also just pull files you need not the whole backup.

[–] [email protected] 3 points 3 months ago (2 children)

There isn’t anything that meets your criteria.

Optical suffers from separation, hard drives break down, ssds lose their charge, tape is fantastic but has a high cost of entry.

There’s a lot of replies here, but if I were you I’d get last generation or two’s lto machine from some surplus auction and use that.

People hate being told to use magnetic tape, but it’s very reliable, long lived, pretty cost effective once you have a machine and surprisingly repairable.

What few replies are talking about is the storage conditions. If your archive can be relatively small and disconnected then you can easily meet some easy requirements for long term storage like temperature and humidity stability with a cardboard box, styrofoam cut to shape and desiccant packs (remember to rotate these!). An antifungal/antimicrobial agent on some level would be good too.

[–] [email protected] 1 points 3 months ago (1 children)

Do unplugged SSDs eventually lose the data?

[–] [email protected] 4 points 3 months ago (1 children)

Yes. They also slowly take longer to access their data with every read.

[–] [email protected] 2 points 3 months ago (1 children)

Wow, I didn't know reads deteriorate SSDs. What's the reason? Is the rate significant?

[–] [email protected] 3 points 3 months ago

The data is stored in little ccd cells. It’s recorded as an analog voltage. There is no difference between analog voltages and digital voltages, I’m just using the word analog to establish that the potential is a domain that can vary continuously.

When you read the data, the levels of the voltages are checked and translated to the digital information they represent.

To determine the level of a voltage, a small amount of current is allowed to flow between the two points being measured. It’s a very small amount. Microamps and less.

When you draw current from a charge carrying device the charge, as represented by the potential between its negative and positive terminals, the voltage, decreases.

When the controller in the ssd responsible for reading voltages and assembling them into porno.mov doesn’t get a clear read, it asks again. As the ssd ages, parts of it can be re queried hundreds of times just to get commonly read information into memory like system files.

So the ssd degrades on read, and the user experiences this as “slowness”.

Would rewriting the data fix this problem? Yes. Using either badblocks -n, dd or a program called spinrite, rewriting the data fixes that problem.

Why doesn’t the ssd just do it? Because the ssd only has so many write cycles before its toast. Better to rely on the user or more accurately the host os to dictate those writes than to take on that responsibility.

[–] [email protected] 4 points 3 months ago* (last edited 3 months ago)

People hate being told to use magnetic tape

Because there are still horror stories of them falling apart and not lasting even in proper controlled conditions

[–] [email protected] 7 points 3 months ago

Assume anything you can buy has a shelf life and set a yearly reminder on your calendar to copy forward stuff more than five or so years old, if those files are of significant value to you. Or for the documents, print them out—paper has better longevity than any consumer-available electronic storage.

That being said, quality optical discs are probably the best option in terms of price to longevity ratio for the average person right now. Just keep in mind that they are not guaranteed to last forever and do need to be recopied from time to time.

(I have yet to have a DVD fail on me, but I keep them in hard plastic jewel cases in climate-controlled conditions, and I've probably just been lucky.)

[–] [email protected] 6 points 3 months ago

You might be interested in git-annex (see the Bob use case).

It has file tracking so you can - for example - "ask" a repository at drive A where some file is, and git-annex can tell you it's on drives C and D.

git-annex can also enforce rules like: "always have at least 3 copies of file X, and any drive will do"; "have one copy of every file at the drives in my house, and have another at the drives in my parents' house"; or "if a file is really big, don't store it on certain drives".

[–] [email protected] 3 points 3 months ago

I used to write to DVD's, but the failure rate was astronomical - like 50% after 5 years, some with physical separation of the silvering. Plus today they're so relatively small they're not worth using.

I've gone through many iterations and currently my home setup is this:

  • I have several systems that make daily backups from various computers and save them onto a hard drive inside one of my servers.
  • That server has an external hard drive attached to it controlled by a wifi plug controlled by home assistant.
  • Once a month, a scheduled task wakes up that external hdd and copies the contents of the online backup directory onto it. It then turns it off again and emails me "Oi, minion. Backups complete, swap them out". That takes five minutes.
  • Then I take the usb disk and put it in my safe, removing the oldest of 3 (the classic, grandfather, father, son rotation) from there and putting that back on the server for next time.
  • Once a year, I turn the oldest HDD into an "Annual backup", replacing it with a new one. That stops the disks expiring from old age at the same time, and annual backups aren't usually that valuable.

Having the hdd's in the safe means that total failure/ransomware takes, at most, a month's worth. I can survive that. The safe is also fireproof and in another building to the server.

This sort of thing doesn't need to be high capacity HDDs either - USB drives and micro-SD cards are very capable now. If you're limited on physical space and don't mind slower write times (which when automating is generally ok), the microSd's and clear labelling is just as good. You're not going to kill them through excessive writes for decades.

I also have a bunch of other stuff that is not critical - media files, music. None of that is unique and can be replaced. All of that is backed to a secondary "live" directory on the same pc - mostly in case of my incompetence in deleting something I actually wanted. But none of that is essential - I think it's important to be clear about what you "must save" and what is "nice to save"

The clear thing is to sit back and work out a system that is right for you. And it always, ALWAYS should be as automated as you can make it - humans are lazy sods and easily justify not doing stuff. Computers are great and remembering to do repetitive tasks, so use that.

Include checks to ensure the backed up data is both what you expected it to be, and recoverable - so include a calendar reminder to actually /read/ from a backup drive once or twice a year.

[–] [email protected] 7 points 3 months ago (1 children)

I use LTO magnetic tape for archiving data, but unfortunately the tape drives are VERY expensive. The tape itself is relatively cheap though (this is a 5-pack at 12TB uncompressed, 30TB compressed per cardridge, totaling at 60TB uncompressed, 150TB compressed. This is a lot cheaper than hard drives, and lasts for much longer), has large storage capacity and 30+ years of shelf life. Yes, I know, LTO 9 has come out, but I won't be upgrading, because LTO 8 works just fine for me, and is much cheaper. The drives are backwards compatible by one generation though, e.g. you can use LTO 8 tape in an LTO 9 drive.

[–] [email protected] 2 points 3 months ago (1 children)

5 k€? No wonder no one uses tape for home usage. You can come up with a lot of cheaper alternatives for that price.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

I got a used unit for much cheaper from the place I previously worked at. If I had to spend 5K on one, I probably wouldn't use it either.

[–] [email protected] 11 points 3 months ago (1 children)
[–] [email protected] 1 points 3 months ago

Ugh, sounds icky. Thanks for the advice:)

[–] [email protected] 4 points 3 months ago (1 children)

Use a raid atrray, and replace drives as they fail. Ideally they wouldnt fail behind your back, like an optical disk would.

[–] [email protected] 1 points 3 months ago (1 children)

That is an always ON approach? For example with an NAS? While that is a very save approach, it does not fit the idea of having something "on the shelf". Thank you for the advice though :)

[–] [email protected] 1 points 3 months ago

You could turn it off and turn it back on every X period of time, but that doesn't guarantee something doesn't go wrong in between. It sounds like you don't have alot of data relatively speaking. Is there a reason not to keep it on your present machine and do the above? Cost? IIRC you can get a 1 tb m.2 for under $150.

[–] [email protected] 5 points 3 months ago (1 children)

The local-plus-remote strategy is fine for any real-world scenario. Make sure that at least one of the replicas is a one-way backup (i.e., no possibility of mirroring a deletion). That way you can increment it with zero risk.

And now for some philosophy. Your files are important, sure, but ask yourself how many times you have actually looked at them in the last year or decade. There's a good chance it's zero. Everything in the world will disappear and be forgotten, including your files and indeed you. If the worst happens and you lose it all, you will likely get over it just fine and move on. Personally, this rather obvious realization has helped me to stress less about backup strategy.

[–] [email protected] 3 points 3 months ago (1 children)

So you would suggest to get bigger and bigger storages?

I really like and can embrace the philosophical part. I do delete rigorously data. At the same time, i once had a data lost, because I was young and stupid and tried to install Suse without an backup. I still am sad to not to be able to look at the images of me and my family from this time. I do look at those pictures/videos/recordings from time to time. It gives me a nice feeling of nostalgia. Also grounds me and shows me how much have changed.

[–] [email protected] 2 points 3 months ago (1 children)

Fair enough!

So you would suggest to get bigger and bigger storages?

Personally I would suggest never recording video. We did fine without it for aeons and photos are plenty good enough. If you can still to this rule you will never have a single problem of bandwidth or storage ever again. Of course I understand that this is an outrageous and unthinkable idea for many people these days, but that is my suggestion.

[–] [email protected] 2 points 3 months ago

Never recording videos... That is outrageous ;) Interesting train of thought, though. Video is the main data hog on my drives. It's easy to mess up the compression. At the same time is combines audio, image and time in one easy to consume file. Personally, i would miss it.

[–] [email protected] 5 points 3 months ago* (last edited 3 months ago) (1 children)

3-2-1 rule with restic. Check it out.

[–] [email protected] 1 points 3 months ago (1 children)

Checked it out, thanks. I have to figure out, how it compares to my rsync Script

[–] [email protected] 2 points 3 months ago

Waaaaay better.

Restic allows you to make dedupe snapshots of your data. Everything is there and it’s damn hard to loose anything. I use backblaze b2 as my long term end point / offsite… some will use AWS glacier. But you don’t have to use any cloud services. You can just have a restic repository on some external drives. That’s what I use for my second copy of things. I also will do an annual backup to a hard disk that I leave with a friend for a second offsite copy.

I’ve been backing up all of my stuff like this for years now. I used to use BORG which is another great tool. But restic is more flexible with allowing multiple systems to use a single repository and has native support for things like B2 that BORG doesn’t.

We also use restic to backup control nodes for some of supercomputing clusters I manage. It’s that rock solid imho.

[–] [email protected] 5 points 3 months ago (1 children)

Don’t over complicate it. 3 copies: backup, main, and offsite; 2 different media: hdd and data center; 1 offsite. I like blackblaze but anything from google to Amazon will work.

[–] [email protected] 1 points 3 months ago

Good advice. My off-site is my brother's place.

[–] [email protected] 3 points 3 months ago (1 children)

As a start, follow the 3-2-1 rule:

  • At least 3 copies of the data.

  • On at least 2 different devices / media.

  • At least 1 offsite backup.

I would add one more thing: invest in a process for verifying that your backups are working. Like a test system that is occasionally restored to from backups.

Let's say what you care about most is photos. You will want to store them locally on a computer somewhere (one copy) and offsite somewhere (second copy). So all you need to do is figure out one more local or offsite location for your third copy. Offsite is probably best but is more expensive. I would encrypt the data and then store on the cloud for my main offsite backup. This way your data is private so it doesn't matter that it is stored in someone else's server.

I am personally a fan of Borg backup because you can do incremental backups with a retention policy (like Macs' Time Machine), the archive is deduped, and the archive can be encrypted.

Consider this option:

  1. Your data raw on a server/computer in your home.

  2. An encrypted, deduped archive on that sane computer.

  3. That archive regularly copied to a second device (ideally another medium) and synchronized to a cloud file storage system.

  4. A backup restoration test process that takes the backups and shows that they restores important files, the right number, size, etc.

If disaster strikes and all your local copies are toast, this strategy ensures you don't lose important data. Regular restore testing ensures the remote copy is valid. If you have two cloyd copies, you are protected against one of the providers screwing up and removing data without you knowing and fixing it.

[–] [email protected] 2 points 3 months ago

Interesting take on the test process. Never really thought of that. I just trusted in rsyncs error messages. Maybe I write a script to automate those checks. Thanks

[–] [email protected] 5 points 3 months ago* (last edited 3 months ago) (1 children)

Someone else has mentioned M-Disc and I want to second that. The benefit of using a storage format like this is that the actual storage media is designed to last a long time, and it is separate from the drive mechanism. This is a very important feature - the data is safe from mechanical, electrical and electronic failure because the storage is independent of the drive. If your drive dies, you can replace it with no risk to the data. Every serious form of archival data storage is the same - the storage media is separate from the reading device.

An M-Disc drive is required to write data, but any DVD or BD drive can read the data. It should be possible to acquire a replacement DVD drive to recover the data from secondary markets (eBay) for a very long time if necessary, even after they're no longer manufactured.

[–] [email protected] 3 points 3 months ago

M-disk, never heard of that. I got a quick research done and it seems to be exactly what I was looking for. Thank you!

[–] [email protected] 3 points 3 months ago (2 children)

I would use maybe a Raspberry Pi or old laptop with two drives (preferably different brands/age, HDD or SSD doesn't really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.

Then, from that device, pull the data from your main system as needed (that way, the main system has no way of breaking into the backup device so won't be affected by ransomware), and once it's done, shut it off or even unplug it completely and store it securely, preferably in a metal box to avoid any magnetic fields from interfering with the drives. Plug it in and boot it up every now and then to perform a scrub to validate that the data is all still intact and repair the data as necessary and resilver a drive if one of them fails.

The unfortunate reality is most storage mediums will eventually fade out, so the best way to deal with that is an active system that can check data integrity and correct the files, and rewrite all the data once in a while to make sure the data is fresh and strong.

If you're really serious about that data, I would opt for both an HDD and an SSD, and have two of those systems at different locations. That way, if something shakes up the HDD and damages the platter, the SSD is probably fine, and if it's forgotten for a while maybe the SSD's memory cells will have faded but not the HDD. The strength is in the diversity of the mediums. Maybe burn a Blu-Ray as well just in case, it'll fade too but hopefully differently than an SSD or an HDD. The more copies, even partial copies, the more likely you can recover the entirety of the data, and you have the checksums to validate which blocks from which medium is correct. (Fun fact, people have been archiving LaserDiscs and repairing them by ripping the same movie from multiple identical discs, as they're unlikely to fade at exactly the same spots at the same time, so you can merge them all together and cross-reference them and usually get a near perfect rip of it).

[–] [email protected] 1 points 3 months ago

"The strength is in the diversity of the mediums" I like that. Should be part of the book of Zen for Backups. Thank you for your insights.

[–] [email protected] 2 points 3 months ago

with two drives (preferably different brands/age, HDD or SSD doesn't really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.

an important detail here is to add the 2 disks to the filesystem in a way so that the second one does not extend the capacity, but adds parity. on ZFS, this can be done with a mirror vdev (simplest for this case) or a raidz1 vdev.

[–] [email protected] 48 points 3 months ago (2 children)

This is my day job, so I'd like to weigh in.

First of all, there's a whole community of GLAM institutions involved in what is called Digital Preservation (try googling that specifically). Here in Germany, a lot of them have founded the Nestor Group (www.langzeitarchivierung.de) to further the case and share knowledge. Recently, Nestor had a discussion group on Personal Digital Archiving, addressing just your use case. They have set up a website at https://meindigitalesarchiv.de/ with the results. Nestor publishes mostly in German, but online translators are a thing, so I think you will be fine.

Some things that I want to address from your original post:

  • Keep in mind that file formats, just like hardware and software, become obsolete over time. Think about a migration strategy for your files to a more recent format of your current format falls out of style and isn't as widely supported anymore. I assume your photos are JPGs, which are widely not considered safe for preservation, as they decay with subsequent encoding runs and use lossy compression. A suitable replacement might be PNG, though I wouldn't go ahead and convert my JPGs right away. For born digital photo material, uncompressed TIFF is the preferred format.
  • Compression in general is considered a risk, because a damaged bit will potentially impact a larger block of compressed data. Saving a few bytes on your storage isn't worth listing your precious memories.
  • Storage media have different retention times. It's true that magnetic tape storage has the best chances for survival, and it's what we use for long term cold storage, but it's prohibitively expensive for home use. Also, it's VERY slow on random access, because tape has to be rewound to the specific location of your file before reading. If you insist on using it, format your tapes using LTFS to eliminate the need for a storage management system like IBM Spectrum Protect. The next best choice of storage media are NAS grade HDDs, which will last you upwards of five years. Using redundancy and a self correcting file system like ZFS (compression & dedup OFF!) will increase your chances of survival. Keep you hands off optical storage media; they tend to decay after a year already according top studies on the subject. Flash storage isn't much greater either, avoid thumb drives at all cost. Quality SSD storage might last you a little longer. If you use ZFS or a comparable file system that provides snapshots, you can use that to implement immutability.
  • Kudos for using Linux standard tooling; it will help other people understand your stack of anything happens to you. Digital Preservation is all about removing dependencies on specific formats, technologies and (importantly) people.
  • Backup is not Digital Preservation, though I will admit that these two tend get mixed into one another in personal contexts. Backups save the state of a system at a specific point in time, DigiPres tries to preserve only data that isn't specific to a system and tends to change very little. Also, and that is important, DigiPres tries to save context along with the actual payload, so you might want to at least save some metadata along with your photos and store them all in a structure that is made for preservation. I recommend BagIt; there's a lot of existing tooling for creating it, it's self-contained, secured by strong checksums and it's an RFC.
  • Keep complexity as low as possible!
  • Last of all, good on you for doing SOMETHING. You don't have to be perfect to improve your posture, and you're on the right track, asking the right questions. Keep on going, you're doing great.

Come back at me if you have any further questions.

[–] [email protected] 2 points 3 months ago (1 children)

And have multiple copies in at least two locations of anything truly important to guard against disaster (such as a fire or regionally appropriate natural disaster). I got a spare drive to copy all the music that I've made and sent it to my father in a different part of the country. I could lose everything and be pretty bummed, but not that (without severe depression). I also endorse use of a safe deposit box at a bank if you don't have someone who can hold data in a different city.

[–] [email protected] 2 points 3 months ago

Yeah, you can always go crazy with (off site) copies. There's a DigiPres software system literally called LOCKSS (Lots Of Copies Keep Stuff Safe).

The German Federal Office for Information Security recommends a distance of at least 200km between (professional) sites that keep georedundant copies of the same data/service, so depending on your upload capacity and your familiarity with encryption (ALWAYS backup your keys!), some cloud storage provider might even be a viable option to create a second site.

Spare drives do absolutely work as well, but remember that, depending on the distance, data there will get more or less outdated and you might not remember to refresh the hardware in a timely manner.

A safe deposit box is something that I hadn't considered for my personal preservation needs yet, but sounds like a good idea as well.

Whatever you use, also remember to read back data from all copies regularly and recalculate checksums for fixity checks to make sure your data doesn't get corrupted over time. Physical objects (like books) decay slowly over time, digital objects break more spontaneously and often catastrophically.

[–] [email protected] 6 points 3 months ago (1 children)

Wow, that was a rabbit hole of information, links and ideas. Thanks a lot. I reached a point of what I would call "satisfaction" https://cdn.nationalarchives.gov.uk/documents/selecting-storage-media.pdf was back linked by Nestor and it seems to give me an idea of what I'll do next. Thanks again 👍

[–] [email protected] 2 points 3 months ago

Good to hear! When you go with the National Archives UK, you can't fail. They have some very, VERY competent people in staff over there, who are also quite active in the DigiPres community. They are also the inventors of DROID and the maintainers of the widely used PRONOM database of file formats. https://www.nationalarchives.gov.uk/PRONOM/Default.aspx Absolute heroes of Digital Preservation.

load more comments
view more: next ›