this post was submitted on 18 Aug 2024
220 points (97.4% liked)

Linux

48129 readers
454 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

I'm writing a program that wraps around dd to try and warn you if you are doing anything stupid. I have thus been giving the man page a good read. While doing this, I noticed that dd supported all the way up to Quettabytes, a unit orders of magnitude larger than all the data on the entire internet.

This has caused me to wonder what the largest storage operation you guys have done. I've taken a couple images of hard drives that were a single terabyte large, but I was wondering if the sysadmins among you have had to do something with e.g a giant RAID 10 array.

(page 2) 50 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 2 months ago

Multiple TB when setting up a new server to mirror an existing one. (Did an initial copy with both together in the same room, before moving the clone to a physically separate location. Doing that initial copy would saturate the network connection for a week or more otherwise)

[–] [email protected] 8 points 2 months ago

I once moved ~5TB of research data over the internet. It took days and unfortunately it also turned out that the data was junk :/

[–] [email protected] 18 points 2 months ago (5 children)

In the middle of something 200tb for my Plex server going from a 12 bay system to a 36 LFF system. But I've also literally driven servers across the desert because it was faster than trying to move data from one datacenter to another.

[–] [email protected] 9 points 2 months ago (3 children)

That's some RFC 2549 logic, right there.

load more comments (3 replies)
load more comments (4 replies)
[–] [email protected] 7 points 2 months ago (1 children)

Upgraded a NAS for the office. It was reaching capacity, so we replaced it. Transfer was maybe 30 TB. Just used rsync. That local transfer was relatively fast. What took longer was for the NAS to replicate itself with its mirror located in a DC on the other side of the country.

[–] [email protected] 3 points 2 months ago (2 children)

Yeah it's kind of wild how fast (and stable) rsync is, especially when you grew up with the extremely temperamental Windows copying thing, which I've seen fuck up a 50mb transfer before.

The biggest one I've done in one shot with rsync was only about 1tb, but I was braced for it to take half a day and cause all sorts of trouble. But no, it just sent it across perfectly first time, way faster than I was expecting.

load more comments (2 replies)
[–] [email protected] 3 points 2 months ago* (last edited 2 months ago)

Probably ~15TB through file-level syncing tools (rsync or similar; I forget exactly what I used), just copying up my internal RAID array to an external HDD. I've done this a few times, either for backup purposes or to prepare to reformat my array. I originally used ZFS on the array, but converted it to something with built-in kernel support a while back because it got troublesome when switching distros. Might switch it to bcachefs at some point.

With dd specifically, maybe 1TB? I've used it to temporarily back up my boot drive on occasion, on the assumption that restoring my entire system that way would be simpler in case whatever I was planning blew up in my face. Fortunately never needed to restore it that way.

[–] [email protected] 4 points 2 months ago

~340GB, more than a million small files (~10KB or less each one). It took like one week to move because the files were stored in a hard drive and it was struggling to read that many files.

[–] [email protected] 4 points 2 months ago* (last edited 2 months ago)

My cousin once stuffed an ISO through my mail server in '98. His connection up in Bella Bella restricted non-batched comms back then, so he jammed it through the server as email to get on the batched quota.

It took the data and passed it along without error, albeit with some constipation!

[–] [email protected] 1 points 2 months ago

Probably some vigeo game on that is ~150-200 GiB. Does that count?

[–] [email protected] 1 points 2 months ago (1 children)

While I haven't personally had to move a data center I imagine that would be a pretty big transfer. Probably not dd though.

load more comments (1 replies)
[–] [email protected] 5 points 2 months ago

I've imaged an entire 128GB SSD to my NAS...

[–] [email protected] 3 points 2 months ago

80GB, it was 8 hours of (supposedly) 4k content in the MP4 format. https://www.youtube.com/watch?v=VF5JWdaJlvc Here's the link (hoping for the piped bot to appear).

[–] [email protected] 5 points 2 months ago (3 children)

When I was in highschool we toured the local EPA office. They had the most data I've ever seen accessible in person. Im going to guess how much.

It was a dome with a robot arm that spun around and grabbed tapes. It was 2000 so I'm guessing 100gb per tape. But my memory on the shape of the tapes isn't good.

Looks like tapes were four inches tall. Let's found up to six inches for housing and easier math. The dome was taller than me. Let's go with 14 shelves.

Let's guess a six foot shelf diameter. So, like 20 feet circumference. Tapes were maybe .8 inches a pop. With space between for robot fingers and stuff, let's guess 240 tapes per shelf.

That comes out to about 300 terabytes. Oh. That isn't that much these days. I mean, it's a lot. But these days you could easily get that in spinning disks. No robot arm seek time. But with modern hardware it'd be 60 petabytes.

I'm not sure how you'd transfer it these days. A truck, presumably. But you'd probably want to transfer a copy rather than disassemble it. That sounds slow too.

[–] [email protected] 3 points 2 months ago

Tape robots are fun, but tape isn't as popular today.

Yes, it's a truck. It's always been a truck, as the bandwidth is insane.

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago) (1 children)

This was your local EPA? Do you mean at the state level (often referred to as "DEP")? Or is this the federal EPA?

Because that seems like quite the expense in 2000, and I can't imagine my state's DEP ever shelling out that kind of cash for it. Even nowadays.

Sounds cool though.

load more comments (1 replies)
load more comments (1 replies)
[–] [email protected] 3 points 2 months ago (4 children)

Why would dd have a limit on the amount of data it can copy, afaik dd doesn't check not does anything fancy, if it can copy one bit it can copy infinite.

Even if it did any sort of validation, if it can do anything larger than RAM it needs to be able to do it in chunks.

[–] [email protected] 3 points 2 months ago

No, it can't copy infinite bits, because it has to store the current address somewhere. If they implement unbounded integers for this, they are still limited by your RAM, as that number can't infinitely grow without infinite memory.

[–] [email protected] 2 points 2 months ago

Not looking at the man page, but I expect you can limit it if you want and the parser for the parameter knows about these names. If it were me it'd be one parser for byte size values and it'd work for chunk size and limit and sync interval and whatever else dd does.

Also probably limited by the size of the number tracking. I think dd reports the number of bytes copied at the end even in unlimited mode.

load more comments (2 replies)
[–] [email protected] 35 points 2 months ago (1 children)

I work in cinema content so hysterical laughter

[–] [email protected] 14 points 2 months ago (1 children)

Interesting! Could you give some numbers? And what do you use to move the files? If you can disclose obvs

[–] [email protected] 24 points 2 months ago* (last edited 2 months ago) (4 children)

A small dcp is around 500gb. But that's like basic film shizz, 2d, 5.1 audio. For comparison, the 3D deadpool 2 teaser was 10gb.

Aspera's commonly used for transmission due to the way it multiplexes. It's the same protocolling behind Netflix and other streamers, although we don't have to worry about preloading chunks.

My laughter is mostly because we're transmitting to a couple thousand clients at once, so even with a small dcp thats around a PB dropped without blinking

[–] [email protected] 3 points 2 months ago (1 children)

I used to work in the same industry. We transferred several PBs from West US to Australia using Aspera via thick AWS pipes. Awesome software.

load more comments (1 replies)
[–] [email protected] 11 points 2 months ago (3 children)
[–] [email protected] 11 points 2 months ago

Digital Cinema Package; basically the movie file you're watching when you're in a movie theater.

load more comments (1 replies)
[–] [email protected] 6 points 2 months ago (4 children)

In the early 2000s I worked on an animated film. The studio was in the southern part of Orange County CA, and the final color grading / print (still not totally digital then) was done in LA. It was faster to courier a box of hard drives than to transfer electronically. We had to do it a bunch of times because of various notes/changes/fuck ups. Then the results got courier'd back because the director couldn't be bothered to travel for the fucking million dollars he was making.

[–] [email protected] 4 points 2 months ago

Oh yeah I worked in animation for a bit too. Those 4K master files are no joke lol

load more comments (3 replies)
load more comments (1 replies)
[–] [email protected] 5 points 2 months ago* (last edited 2 months ago) (1 children)

When I was moving from a Windows NAS (God, fuck windows and its permissions management) on an old laptop to a Linux NAS I had to copy about 10TB from some drives to some other drives so I could re-format the drives as a Linux friendly format, then copy the data back to the original drives.

I was also doing all of this via terminal, so I had to learn how to copy in the background, then write a script to check and display the progress every few seconds. I'm shocked I didn't loose any data to be completely honest. Doing shit like that makes me marvel at modern GUIs.

Took about 3 days in copying files alone. When combined with all the other NAS setup stuff, ended up taking me about a week just in waiting for stuff to happen.

I cannot reiterate enough how fucking difficult it was to set up the Windows NAS vs the Ubuntu Server NAS. I had constant issues with permissions on the Windows NAS. I've had about 1 issue in 4 months on the Linux NAS, and it was much more easily solved.

The reason the laptop wasn't a Linux NAS is due to my existing Plex server instance. It's always been on Windows and I haven't yet had a chance to try to migrate it to Linux. Some day I'll get around to it, but if it ain't broke... Now the laptop is just a dedicated Plex server and serves files from the NAS instead of local. It has much better hardware than my NAS, otherwise the NAS would be the Plex server.

[–] [email protected] 3 points 2 months ago (1 children)

so I had to learn how to copy in the background, then write a script to check and display the progress every few seconds

I hope you learned about terminal multiplexers in the meantime... They make your life much easier in cases like this.

load more comments (1 replies)
[–] [email protected] 58 points 2 months ago (8 children)

I'm currently backing up my /dev folder to my unlimited cloud storage. The backup of the file /dev/random is running since two weeks.

load more comments (8 replies)
[–] [email protected] 3 points 2 months ago

Do cloud platform storage operations count? If so, in the hundreds of terabytes (work)

[–] [email protected] 4 points 2 months ago

I synced to the BSV shitcoin which is 11+ terabytes. So large I had to turn on throwing away the rest of what I downloaded because it wouldn't fit on all of the storage media I own. I feel sorry for the people running an archive node.

[–] [email protected] 5 points 2 months ago (1 children)

Manually transferred about 7TBs to my new Rpi4 powered NAS. It took a couple of days because I was lazy and transferred 15 GBs at a time which slowed down the speed for some reason. It could handle small sub 1 GB files in half a minute otherwise.

[–] [email protected] 4 points 2 months ago

Could the slowdown be down to HDDs that cache on a section of - I think it's single layer? - and slowly rewrite that cache onto the denser (compound layer?) storage?

[–] [email protected] 18 points 2 months ago (1 children)

I once abused an SMTP relay (my own) by emailing Novell a 400+ MB memory dump. Their FTP site kept timing out.

After all that, and them swearing they had to have it, the OS team said "Nope, we're not going to look at it". Guess how I feel about Novell after that?

This was in the mid-90's.

[–] [email protected] 3 points 2 months ago

Well, at least they were being on-brand. 😅

load more comments
view more: ‹ prev next ›