this post was submitted on 07 Apr 2025
148 points (95.7% liked)

Technology

68992 readers
3794 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 1 week ago (3 children)

I would argue that duplication of content is a feature, not a bug. It adds resilience, and is explicitly built into systems like CDNs, git, and blockchain (yes I know, blockchains suck at being useful, but nevertheless the point is that duplication of data is intentional and serves a purpose).

[–] [email protected] 2 points 1 week ago* (last edited 1 week ago) (1 children)

explicitly built into systems like CDNs, git, and blockchain

Git only duplicates blobs; textual content is generally stored as deltas (look at git_repack for more details). And it's bad practice to version-control blobs: the more correct approach is to control the source from which the blob is generated.

CDNs don't all work alike so it's impossible to generalize. I won't comment on blockchain, since in my work as a developer and architect, I've never encountered a valid use case for it.

[–] [email protected] 3 points 1 week ago* (last edited 1 week ago)

You're missing the forest for the tree here.

Given identical client setups, two clones of a git repo are identical. That's duplication, and it's an intentional feature to allow concurrent development.

A CDN works by replicating content in various locations. Anycast is then used to deliver the content from any one of those locations, which couldn't be done reliably without content duplication.

Blockchains work by checking new blocks against previous blocks. In order to fully guarantee the validity of a block you need to guarantee every block, going back to the beginning of the chain. This is why each root node on a chain needs a full local copy of it. Duplication.

My point is that we have a lot of processes that rely on full or partial duplication of data, for several purposes: concurrency, faster content delivery, verification, etc. Duplicated data is a feature, not a bug.

[–] [email protected] -2 points 1 week ago

Technically git is a blockchain

[–] [email protected] 4 points 1 week ago

If the data has value, then yes, duplication is a good thing up to a point. The thesis is that only 10% of the data has value, though, and therefore duplicating the other 90% is a waste of resources.

The real problem is figuring out which 10% of the data has value, which may be more obvious in some cases than others.