this post was submitted on 10 Mar 2024
43 points (100.0% liked)
Free and Open Source Software
17911 readers
52 users here now
If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Advertisers buy from data brokers, not necessarily directly from Meta or Discord. Meta and Google act as data brokers themselves, but they also sell to other data brokers. Those data brokers, will definitely scrape your posts themselves, if they can't buy them, or the derived data, directly.
Lemmy, and the Fediverse, has multiple instances that federate and get handed out copies of what we post. We don't really know what's going on at each and every instance, and there's no way of knowing.
(don't do this)
If I was a data broker wanting to siphon data from the Fediverse, I'd set up several instances with fake communities and fake users, federate with the different shards of the Fediverse, have the fake users subscribe to as many feeds as possible (easier to do on Lemmy/Kbin than on Mastodon), create accounts on some of the larger instances to get the "Local" feed, and just wait for the data to arrive. It would miss some of the posts, mostly from smaller less federated non-Lemmy instances, but I'm guessing close to 99% could be siphoned with relatively little effort, and for cheaper than buying the data from any single instance. Scraping historical data is extra easy with instances returning some JSON and having clients parse it, be it in JS or in apps. Deleted messages can be either gathered with the custom instance setup, or retrieved from instances that didn't honor the delete action (there still are some out there).