this post was submitted on 21 May 2025
570 points (98.8% liked)

Technology

70248 readers
3907 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called "Searchcord" based on a different data set that shows non-anonymized chat histories.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 hour ago

I was hoping people would do this!!!

[–] [email protected] 2 points 12 hours ago

If they were on OPEN servers, I doubt they cared that much.

[–] [email protected] 15 points 18 hours ago* (last edited 18 hours ago)

"anonymized" sure. I highly doubt they read every message. I'm sure there is lots of de-anonymizing information in the messages themselves

For example--

Anon1: "hey jeff, wanna play Minecraft?"

Anon2: "sure"

Thus we know Anon2's name is Jeff. I imagine there's a lot of this.

[–] [email protected] 16 points 1 day ago

"scraped" via API? I don't think It means what you think it means.

[–] [email protected] 11 points 1 day ago (1 children)

wtf…… going to get worse after IPO!

[–] [email protected] 1 points 13 hours ago (1 children)

If you don't want strangers knowing what you say don't join open servers it's pretty easy

[–] [email protected] 1 points 13 hours ago

Open or close, going to get worse!

[–] [email protected] 46 points 1 day ago (1 children)

That’s good news. Internet archiving is an important endeavor because you never know when they‘ll pull the plug. Now it‘s a little more secured and probably far more useful than in Discord‘s hands alone.

[–] [email protected] 4 points 23 hours ago (3 children)

Not for messages that are supposed to be private lol. Let me just make a copy of all texts you've sent over the last decade, for "archiving".

[–] [email protected] 2 points 18 hours ago

If you think messages you post anywhere on the internet are private, you're in for a bad time.

[–] [email protected] 3 points 19 hours ago

Texts are sent in plain-text and I wouldn't recommend discussing anything you'd like to keep private via text.

[–] [email protected] 15 points 23 hours ago

This says it was done via the API so they wouldn't be private messages.

[–] [email protected] 8 points 1 day ago

Great news for open source AI.

[–] [email protected] 10 points 1 day ago

Ooh! Do Teams next

[–] [email protected] 14 points 1 day ago
[–] [email protected] 70 points 1 day ago (4 children)

I see a lot of drama here in the thread, people decrying data leaks, how Discord is very very bad, and a number of people wanting the "good old days" of forums.

Yes. I like forums too, but, uh...

These researchers scraped publicly posted messages. Keyword here being "public". How would anything similarly public, like a forum, be better?

I actually remember the times when forums were at their peak. I hung out on BZPower for Bionicle things, and the Relic News Forum for Homeworld modding. You know what they had? Google bots that scraped messages, looked for certain words, and populated websites with advertisements based on what it could scrape from forums.

Pretty sure Lemmy doesn't do encryption either, unless there's some very special, private Lemmy server that nobody has access to. So the researchers could've just as well scraped the fediverse.

[–] [email protected] 3 points 15 hours ago

People saw “scraped Discord messages” and immediately jumped to “oh shit fuck my private chats have been leaked everybody panic”.

[–] [email protected] 4 points 23 hours ago* (last edited 20 hours ago)

People in general have no idea and just want to get spun up on drama and manufactured outrage.

Same thing happened when people started scrapping Twitter 10-15 years ago.

[–] [email protected] 7 points 1 day ago

How would anything similarly public, like a forum, be better?

Forums were the primary way that groups would talk with one another pre-global scale social media.

They could contain public subforums, but the majority of all of the forums that I've been a part of were not viewable without an account, which was manually approved or required a small payment (to make bans have a chance to actually stick).

[–] [email protected] 19 points 1 day ago (1 children)

Yeah this being just as easy on bb forums or literally any webpage with a public comment section was my first thought as well..

Isn't most of the internet scraped anyways, by the internet archive? The concerning part is that this is 100% going to be used to train some coomer brained AI. Scraping, botting, scamming: all those things are going to happen on large public communities.

[–] [email protected] 5 points 1 day ago* (last edited 1 day ago)

Yeah, a lot of this push is about ushering in new laws to prevent data scraping.

Propaganda spreads easily through fake accounts—but how do we detect large-scale operations if they’re constantly creating and deleting accounts or trying to blend in with the rest of us? We’d need access to massive data sets to mine for patterns and expose coordinated behavior.

But the powers that benefit from shaping the narrative are the same ones pushing the idea that all scraping is bad. They want people to hate it, so they can justify laws that lock down access. That’s the end game.

[–] [email protected] 122 points 1 day ago

So basically discord finally got a usable search. I count that as a win.

[–] [email protected] 12 points 1 day ago

Saving this article for the next time someone says "Just message me on discord its easier".

[–] [email protected] 11 points 1 day ago* (last edited 1 day ago)

So how does this work? Like how did they get those messages through API calls? Also, is this not something that Discord would dislike since it dilutes the value of their data horde?

[–] [email protected] 18 points 1 day ago (2 children)

So this is:

'Uh guys, Discord chats leaked..."

For... what, just literally everyone who used Discord between 2015 and 2017, everyone who was an early adopter?

Dear fucking god.

I used to say 'someday, people will learn', but fucking no obviously not, no they won't, almost everyone is an idiot and/or truly doesn't care.

... I guess this'll be fodder for a whole bunch of dramatubers / pedohunters for the next year or so...

[–] [email protected] 1 points 20 hours ago

It wasn't the chats though. It was public servers that can be found through the discovery tab. I would love to be up and arms about this and convince people to switch but.. Looking at it objectively, this isn't terribly different from if they'd archived public subreddits and their posts.

[–] [email protected] 36 points 1 day ago (2 children)

The disappearance of forum public discussion to unsearchable, unpreserved, discord semi-private discussion chambers is probably the largest informational catastrophe of the internet so far.

[–] [email protected] 2 points 1 day ago* (last edited 1 day ago) (2 children)

But I want something like Discord. Just not corporate owned...

[–] [email protected] 3 points 1 day ago
[–] [email protected] 3 points 1 day ago

Not as featureful, but my friends and I run our own xmpp server

[–] [email protected] 8 points 1 day ago (1 children)

I potentially agree, but as a possible competitor, I submit:

Everything DOGE has done in the last 3 months.

[–] [email protected] 2 points 1 day ago (1 children)

That information was at least available to capture.

With discord it was EULA-walled and anti-scraper locked

[–] [email protected] 3 points 1 day ago

... No no no it all wasn't.

The DOGE goons made up multiple logins to multiple US Gov databases that are not open to the public... inucluding the DoD's SIPRNet...

... and we know at least some of these logins were also used from utterly unsecure personal devices, remotely, not onsite, and that they've been getting used by IP addresses from all over the place, all over the world, meaning said login creds have either outright been given away, or been compromised by other nation state's hackers, or just total rando hackers.

load more comments
view more: next ›