this post was submitted on 18 Jun 2024
23 points (89.7% liked)

Selfhosted

39980 readers
768 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Hello,

I am going to upgrade my server, taking advantage of the fact that I am going to be able to put more hard disks, I wanted to take advantage of this to give a little more security (against loss) to my data.

Currently I have 2 hard drives in ext4 with information, and wanted to buy a third (same capacity all three) and place them in raid5, so that in the future, I can put more hard drives and increase the capacity.

Due to economic issues, right now I can only buy what would be the third disk, so it is impossible for me to back up the data I currently have.

The data itself is not valuable, in case any file gets corrupted, I could download it again, however there are enough teras (20) to make downloading everything a madness.

In principle I thought to put on this server (PC) a dietpi, a trimmed debian and maybe with mdadm make the raid. I have seen tutorials on how to do it (this for example https://ruan.dev/blog/2022/06/29/create-a-raid5-array-with-mdadm-on-linux ).

The question is, is there any way without having to format the hard drives with data?

Thank you and sorry for any mistakes I may make, English is not my mother language.

EDIT:

Thanks for yours answers!! I have several paths to investigate.

all 48 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 4 months ago

Just use Snapraid & MergerFS. No special Hardware required and you don't need to change what is on your disks.

From a quick search: https://perfectmediaserver.com/02-tech-stack/snapraid/

[–] [email protected] 2 points 4 months ago

If you used ZFS this would be easier to fix. I would recommend switching to it.

It sounds like you need another disk. I know that isn't always possible and if it isn't delete enough data to to copy it over to a single disk. Without backups you are destined to lose your data anyway.

For a ZFS three disk I would go with raidz1 as that will give you one drive for redundancy.

[–] [email protected] 3 points 4 months ago* (last edited 4 months ago) (1 children)

The question is, is there any way without having to format the hard drives with data?

MergerFS would let you pool drives without needing to set up RAID and format them.

Then add SnapRAID on top of that for parity.

[–] [email protected] 1 points 4 months ago

This is how I do it. No striping, normal partitions, different hard drive sizes, pretty easy. This way makes upgrades super easy too. Currently running 76TB mergerfs with 2 14TB Snapraid parity drives.

[–] [email protected] 3 points 4 months ago* (last edited 4 months ago)

My recommendation would be to utilize LVM. Set up a PV on the new drive and create an LV filling the drive (wit an FS), then move all the data off of one drive onto this new drive, reformat the first old drive as a second PV in the volume group, and expand the size of the LV. Repeat the process for the second old drive. Then, instead of extending the LV, set the parity option on the LV to 1. You can add further disks, increasing the LV size or adding parity or mirroring in the future, as needed. This also gives you the advantage that you can (once you have some free space) create another LV that has different mirroring or parity requirements.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
LVM (Linux) Logical Volume Manager for filesystem mapping
RAID Redundant Array of Independent Disks for mass storage
SATA Serial AT Attachment interface for mass storage
SBC Single-Board Computer
ZFS Solaris/Linux filesystem focusing on data integrity

5 acronyms in this thread; the most compressed thread commented on today has 15 acronyms.

[Thread #815 for this sub, first seen 18th Jun 2024, 18:45] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 5 points 4 months ago (1 children)

Traditional RAID isn’t very flexible and is meant/easiest for fresh disks without data. Since you’ve already got data in place, look into something like SnapRAID.

[–] [email protected] 2 points 4 months ago

And mergerFS

[–] [email protected] 3 points 4 months ago (1 children)

I'd suggest you move toward a backup approach ("RAID is not a backup") first. Assuming you have 2x10Tb, get a 3rd and copy half of your files to it, disconnect it, and now half your files are protected. Save, get another, copy the other half, now all your files are protected. If you're trying to do RAID on USB, don't, you are already done, otherwise (using SATA or better) you can proceed to build your array in an orderly fashion.

[–] [email protected] 3 points 4 months ago (1 children)

I know its not backup, but, for me, its the sweet point between money and security. Not only for this 2 hard disk, also for the capacity of add more HDs and don't have all redundancy.

Thanks for your answer!!

[–] [email protected] 1 points 4 months ago

I will say it three times, Raid isn't a backup

Raid isn't a backup

Raid isn't a backup

Seriously though it shouldn't give much peace of mind. All raid does is add a little resistance to hardware failures. If you mistakingly delete files you are hosed. If your hardware causes corruption you are hosed. If something happens to your computer such a physical abuse your drives are likely going to be damaged which will also mean that you may be hosed. If one drive dies and then the other drives dies before you move your data over you are also hosed.

The big take away is that Raid only really buys time. It can prevent downtime but it will not save you.

[–] [email protected] 10 points 4 months ago (1 children)

Not really with mdadm raid5. But it sounds like you like to live dangerously. You could always go the BTRFS route. Yeah, I know BTRFS Raid56 "will eat your data", but you said it's nothing that important anyways. There are some things to keep in mind when running BTRFS in Raid5, e.g. scrub each disk individually, use Raid1c3 for metadata for example.

But basically, BTRFS is one of the only filesystems that allows you to add disks of any size or number, and you can convert the profile on the fly, while in use. So in this case, you could format the new disk with BTRFS as a single disk. Copy over stuff from one of your other disks, then once that disk is empty, add it as a additional device to your existing BTRFS volume. Then do the same with the last disk. Once that is done, you can run a balance convert to convert the single profile into a raid5 data profile.

That being said, there are quite a few caveats to be aware of. Even though it's improved a lot, BTRFS's Raid56 implementation is still not recommended for production use. https://lore.kernel.org/linux-btrfs/[email protected]/

Also, I would STRONGLY recommend against connecting disks via USB. USB HD adapters are notorious for causing all kinds of issues when used in any sort of advanced setup, apart from temporary single disk usage.

[–] [email protected] 1 points 4 months ago

Interesting, i think it will be made for my usecase. i'll check it

Thanks for your answer!!

[–] [email protected] 20 points 4 months ago (2 children)

This is madness, but since this is a hobby project and not a production server, there is a way:

  • Shrink the filesystems on the existing disks to free up as much space as possible, and shrink their partitions.
  • Add a new partition to each of the three disks, and make a RAID5 volume from those partitions.
  • Move as many files as possible to the new RAID5 volume to free up space in the old filesystems.
  • Shrink the old filesystems/partitions again.
  • Expand each RAID component partition one at a time by removing it from the array, resizing it into the empty space, and re-adding it to the array, giving plenty of time for the array to rebuild.
  • Move files, shrink the old partitions, and expand the new array partitions as many times as needed until all the files are moved.

This could take several days to accomplish, because of the RAID5 rebuild times. The less free space, the more iterations and the longer it will take.

[–] [email protected] 7 points 4 months ago

That is madness. I love it

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

He said the two drives are mostly full. It's not a paritioning issue at that point.

[–] [email protected] 7 points 4 months ago (2 children)

Even if you could free up only 1GB on each of the drives, you could start the process with a RAID5 of 1GB per disk, migrate two TB of data into it, free up the 2GB in the old disks, to expand the RAID and rinse and repeat. It will take a very long time, and run a lot of risk due to increased stress on the old drives, but it is certainly something that’s theoretically achievable.

[–] [email protected] 9 points 4 months ago* (last edited 4 months ago) (1 children)

Technically, he would have three drives and only two drives of data. So he could move 1/3 of the data off each of the two drives onto the third and then start off with RAID 5 across the remaining 1/3 of each drive.

[–] [email protected] 4 points 4 months ago

This is smart! Should help reduce the number of loops they’d need to go through and could reduce the stress on the older drives.

[–] [email protected] -2 points 4 months ago* (last edited 4 months ago) (1 children)

Not at all possible whatsoever though. If he has two drives nearly full, he would never be able to fit all replicable data on a RAID 5 of any kind.

What you're describing as a solution is the "3 jugs of water" problem. The difference is you need only one coherent set of data in order to even start a RAID array. Juggling between disks in this case would never make the solution OP is asking if all data can't fit on one single drive, due to the limitations of smallest drive capacity. You can't just swap things around and eventually come up with a viable array if ALL data can't be in one place at one time.

[–] [email protected] 2 points 4 months ago (2 children)

They’re going for RAID5, not 6, so with the third drive these’s no additional requirement.

Say for example if they have 2x 12T drive with 10T used each (they mentioned they’ve got 20T of data currently). They can acquire a 3rd 12T drive, create a RAID5 volume with 3x 1TB, thereby giving them 2TB of space on the RAID volume. They can then copy 2TB of data into the RAID volume, 1TB from each of the existing, verify the copy worked as intended, delete from outside, shrink FS outside on each of the drives by 1TB, add the newly available 1TB into the RAID, rebuild the array, and rinse and repeat.

At the very end, there’d be no data left outside and the RAID volume can be expanded to the full capacity available… assuming the older drives don’t fail during this high stress maneuver.

[–] [email protected] -2 points 4 months ago* (last edited 4 months ago) (2 children)

RAID 5 is a minimum of 3 drives...I'm not sure what you mean.

[–] [email protected] 3 points 4 months ago (1 children)

OP Currently has in their possession 2 drives.

OP has confirmed they're 12TB each, and in total there is 19TB of data across the two drives.

Assuming there is only one partition, each one might look something like this:

Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 12345678-9abc-def0-1234-56789abcdef0

Device         Start        End            Sectors        Size      Type
/dev/sda1      2048         23437499966    23437497919    12.0T     Linux filesystem

OP wants to buy a new drive (also 12TB) and make a RAID5 array without losing existing data. Kind of madness, but it is achievable. OP buys a new drive, and set it up as such:

Device         Start        End            Sectors        Size      Type
/dev/sdc1      2048         3906252047     3906250000     2.0T      Linux RAID

Unallocated space:
3906252048      23437500000   19531247953    10.0T

Then, OP must shrink the existing partition to something smaller, say 10TB for example, and then make use of the rest of the space as part of their RAID5 :

Device         Start        End            Sectors        Size      Type
/dev/sda1      2048         19531250000    19531247953    10.0T     Linux filesystem
/dev/sda2      19531250001  23437499999    3906250000     2.0T      Linux RAID

Now with the 3x 2TB partitions, they can create their RAID5 initially:

sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda2 /dev/sdb2 /dev/sdc1

Make ext4 partition on md0, copy 4TB of data (2TB from sda1 and 2TB from sdb1) into it, verify RAID5 working properly. Once OP is happy with the data on md0, they can delete the copied data from sda1 and sdb1, shrink the filesystem there (resize2fs), expand sda2 and sdb2, expand the sdc1, and resize the raid (mdadm --grow ...)

Rinse and repeat, at the end of the process, they'd end up having all their data in the newly created md0, which is a RAID5 volume spanning across all three disks.

Hope this is clear enough and that there is no more disconnect.

[–] [email protected] -2 points 4 months ago (1 children)

The number of drives doesn't matter when you can't copy to another. There is no replication path here.

[–] [email protected] 0 points 4 months ago

You seem to be under the impression that the "buckets" in this case are all or nothing. They are talking about partitioning the drives and raiding the partitions. The way he describes slowly moving data to an ever increasing raid array would most certainly work, as it is not all or nothing. These buckets have fully separate independent chambers in them that are adjustable at will. Makes leveling them possible, just tedious and risky.

[–] [email protected] 2 points 4 months ago (1 children)

That is a clever aproach, and its just my caseuse, two 12 TB, about 19TB used.

And its for a personal project, so, i don't have any hurry.

Only for clarification several days could be 1 or 2 weeks or we are talking of more time?

[–] [email protected] 2 points 4 months ago

I’m afraid I don’t have an answer for that.

It is heavily dependent on drive speed and number of times you’d need to repeat. Each time you copy data into the RAID, the array would need to write the data plus figuring out the parity data; then, when you expand the array, the array would need to be rebuilt, which takes more time again.

My only tangentially relatable experience with something similar scale is with raid expansion for my RAID6 (so two parity here compared to one on yours) from 5x8TB using 20 out of 24TB to 8x8TB. These are shucked white label WD red equivalents, so 5k RPM 256Mb cache SATA drives. Since it was a direct expansion, I didn’t need to do multiple passes of shrinking and expanding etc., but the expansion itself I think took my server a couple of days to rebuild.

Someone else mentioned you could potentially move some data into the third drive and start with a larger initial chunk… I think that could help reduce the number of passes you’d need to do as well, may be worth considering.

[–] [email protected] -2 points 4 months ago* (last edited 4 months ago) (1 children)

If all of your data won't fit on one single drive, you can't increase your reliability with RAID at this point. You need at least one drive of a size capable of holding all your data to replicate to at least one other drive for RAID 1 at a minimum. Increasing RAID levels from there with replication (not just striping) will only reduce the total amount of space available from the smallest drive capacity in the disk group until you hit a certain number of drives.

Honestly, if you're wanting to increase reliability for fear of data loss, take a run through your data and see if there's anything you can ditch (or easily replace later), see how small that data set can be. Revisit RAID combinations after that.

[–] [email protected] 1 points 4 months ago

I can recover all, but the time to redownload will be too long :)

[–] [email protected] 3 points 4 months ago* (last edited 4 months ago) (2 children)

So I see a few problems with what you want, for a raid5 setup you will need at least four drives, since your information is striped against 3 and then the fourth is a parity drive. with 3 drives you have an incredibly high likelyhood of losing your parity drive.

To my knowledge, you will need to wipe the drives to put them in any kind of raid. Since striping is essentially making custom sections of blocks; I don't think mdadm is smart enough to also move data files as well.

I would really recommend holding off on your project till you can back up the information, and get a fourth drive. I know there is a lot of talks between raid5 and raid6, but for me I really prefer the peace of mind that raid6 gives.

Edit: seems like it is possible with at least raid 1:https://askubuntu.com/questions/1403691/how-can-i-create-mdadm-raid1-without-losing-data

[–] [email protected] 10 points 4 months ago (1 children)

You can do RAID 5 with three disks. It's fine. Not ideal, but fine.

My biggest concern is what OP is using as a server. If these disks are attached via USB, they are not going to have reliable connections, and it's going to trigger frequent RAID rescans and resyncs any time one of the three disks drops out. And the extra load from that might cause even more drops.

[–] [email protected] 4 points 4 months ago (1 children)

I reread this a few times after seeing your comment, but still missing where USB was mentioned. Am I blind?

[–] [email protected] 2 points 4 months ago (1 children)

They didn't say USB, but they did say dietpi. I've never played with a rpi, but I don't think they have SATA or SAS ports, only USB.

[–] [email protected] 2 points 4 months ago (1 children)

Ah, he said PC, so I just assumed he wanted the distribution on x86. I see where you're coming from though.

[–] [email protected] 1 points 4 months ago

Yes, dietpi is main for SBC, but also has an iso for PCs, its and old computer with 6 sata ports

[–] [email protected] 5 points 4 months ago* (last edited 4 months ago) (2 children)

Seconding this. For starters, when tempted to go for Raid5, go for Raid6 instead. I've had drives fail in Raid5, and in turn have a second failure during the increased I/O associated with replacing a failed drive.

And yes, setting up RAID wipes the drives. Is the data private? If not, a friendly datahoarder might help you out with temporary storage.

[–] [email protected] 2 points 4 months ago

It's possible to convert drives to RAID in-place... but strongly discouraged.

Since OP will have a blank drive, they could play musical chairs by setting up a new RAID on the new empty drive, copy data from one drive, wipe that drive, grow the array, copy data from the third drive, wipe, grow... But that's going to take a long time, and you'll have to keep notes about where you are in the process, lest you forget which drive is which over the multiple days this will take.

[–] [email protected] 5 points 4 months ago (3 children)

I run RAID5 on one device.... BUT only because it replicates data that's on 2 other local devices AND that data is backed up to a cloud storage.

And I still want it to be RAID 6.

[–] [email protected] 1 points 4 months ago

If i goes to raid5 i lost one disk of space, to go to raid6 i have to lost 2 disks.

Its a pesonal proyect, and the motherboard has only 6satas, one of them used by the SO disk, and i want to be able of upgrade it in a future...

[–] [email protected] 7 points 4 months ago* (last edited 4 months ago) (2 children)

Story time!

In this one production cluster at work (1.2PB across four machines, 36 drives per machine) everything was Raid6, except ONE single volume on one of the machines that was incorrectly set up as Raid5. It wasn't that worrysome, as the data was also stored with redundancy across the machines in the storage cluster itself (a nice functionality of beegfs), but it annoyed the fuck out of me for the longest time.

There was some other minor deferred maintenance as well which necessitated a complete wipe, but there was no real opportunity to do this and rebuild that particular RAID volume properly until last spring before the system was shipped off to Singapore to be mobilized for a survey. I planned on getting it done before the system was shipped, so I backed up what little remained after almost clearing it all out, nuked the cluster, disassembled the raid5, and then started setting up everything from scratch. Piece of cake, right?

shit

That's when I learned how much time it actually takes to rebuild a volume of 12 disks, 10TB each. I let it run as long as I could before it had to be packed up. After half a year of slow shipping it finally arrived on the other side of the planet, so I booked my plane ticket and showed up a week before anyone else just so I could connect power and continue the reraiding before the rest of the crew showed up. Basically, pushing a few buttons, followed by a week of sitting at various cafes drinking beer. Once the reraid was done, reclustering was done in less than an hour, and restoring the folder structure backup was a few hours on top of that. Not the worst work trip I've had, except from some unexpected and unrelated hardware failures, but that's a story for another day.

Fun fact: While preparing the system for shipment here in Europe, I lost one of my Jabra bluetooth buds. I searched fucking everywhere for hours, but gave up on finding it. I found it half a year later in Singapore, on top of the server rack, surprised it hadn't even rolled down. It really speaks to how little these huge container ships roll.

[–] [email protected] 2 points 4 months ago

Fun story but I’m most impressed with the earbud part of the story. WOW. Absolutely amazing and unexpected.

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago) (1 children)

Haha, everything about that story is awesome, right down to the lost and found Jabra ear bud (does Jabra exist any more? At one time their ear pieces were the best).

Yes, re-silvering takes fucking forever. Even with my little setups (a few TB), it can take a day or two to rebuild one drive in an array. One.

I can only imagine how long a PB array would take.

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago)

Jabra still exists yes. I'm still using Jabra, although I'm using a pair that I bought after I thought that one earbud was gone forever. I still use the older ones, which was Jabra Elite 4, but only with my PC, as its battery took a hit after those 6 months at sea. I currently main Jabra Active 7 or something like that, and I quite like them. I noticed that the cover doesn't stay very attached after a few proper cleans, but nothing a drop of glue doesn't fix. What I really like about the ones I currently use is that they're supposedly built to withstand sweat while training. I don't work out, but it would seem that those who do sweat A LOT, as I can wear mine while showering without any issues.

As for resilvering, the RAIDs are only a small fraction each of the complete storage cluster. I don't remember their exact sizes, but each raid volume is 12 drives of 10TB each. Each machine has three of these volumes. Four machines total contributes all of its raid volumes to the storage cluster for 1.2PB of redundant storage (although I'm tempted to drop the beegfs redundancy, as we could use the extra space, and it's usually fairly hassle free to swap in a new server and move the drives over).

EDIT: I just realized that I have this Jabra confference call speaker attached to the laptop on which I'm currently typing. I mostly use it for discord while playing project zomboid with my friends, though. I run audio output elsewhere, as the jabra is mono only.

[–] [email protected] 0 points 4 months ago (1 children)

Wut...

I think you're missing the point of RAID here, possibly. Where's the reliability in this?

[–] [email protected] 3 points 4 months ago (1 children)

Not to speak for the person above you. But I believe they are saying they have 1 computer with a raid5 array, that backs up to two different local servers, and then at least 1 of those 3 servers backs up to a cloud provider.

If that is true then they are doing it correctly. It is highly recommended to follow a 3-2-1 storage solution, where you back up to a local backup and a cloud backup for redundancy.

[–] [email protected] 2 points 4 months ago (1 children)

Ahhh, makes sense. That kind of wrecked my brain for a moment.

[–] [email protected] 2 points 4 months ago

Lol, sorry, I really tried to make it clear what I was doing, honest, I did! 😄

Yes, I have 3 local devices that replicate to each other, one is RAID5, (well, 2 are, but...not for long). And one of them also does backup to a cloud storage.

Not ideal, because 3 devices are colocated, but it's what I can do right now. I'm working on a backup solution to include friends and family locations (looking to replicate what Crashplan used to provide in their "backup to friends" solution).