this post was submitted on 21 May 2025
573 points (98.8% liked)

Technology

70266 readers
3969 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called "Searchcord" based on a different data set that shows non-anonymized chat histories.

(page 2) 33 comments
sorted by: hot top controversial new old
[–] [email protected] 11 points 2 days ago* (last edited 2 days ago)

So how does this work? Like how did they get those messages through API calls? Also, is this not something that Discord would dislike since it dilutes the value of their data horde?

[–] [email protected] 18 points 2 days ago (8 children)

So this is:

'Uh guys, Discord chats leaked..."

For... what, just literally everyone who used Discord between 2015 and 2017, everyone who was an early adopter?

Dear fucking god.

I used to say 'someday, people will learn', but fucking no obviously not, no they won't, almost everyone is an idiot and/or truly doesn't care.

... I guess this'll be fodder for a whole bunch of dramatubers / pedohunters for the next year or so...

load more comments (8 replies)
[–] [email protected] 5 points 2 days ago

This is just trolling, at this point.

[–] [email protected] 3 points 2 days ago (1 children)

I can't find this "public" json

[–] [email protected] 15 points 2 days ago

Meanwhile AI scrapers: This will be a fine addition to my collection.

[–] [email protected] 35 points 2 days ago

If they aren't comfortable with their Discord messages being public, perhaps they shouldn't have posted those messages in a public forum that the public can access.

[–] [email protected] 10 points 2 days ago

Public data should be accessible anonymously. You can't change my mind.

[–] [email protected] 10 points 2 days ago

They just wanted to find new slurs.

[–] [email protected] 4 points 2 days ago (1 children)

Every time you post, you're posting so that Meta, Google, Reddit and every known retail store like Walmart, Target, Kroger, etc. can see it because they bought that info or harvested it themselves. I think these are great announcements so people can see who sees and manipulates you with your own contributions of data.

load more comments (1 replies)
[–] [email protected] 94 points 2 days ago (3 children)

Well yeah, it's not encrypted. It would be the same as 10 years of Reddit posts or Lemmy posts scraped

[–] [email protected] 3 points 2 days ago

It's indeed not a miracle.

[–] [email protected] 84 points 2 days ago (4 children)

This isn't even them scraping private chats and small servers, they just scraped public servers in the discovery tab. None of that information was ever private, and every user can browse the chat history there.

[–] [email protected] 2 points 2 days ago (1 children)

It also includes deleted messages. And they refuse to delete things when someone opts out.

[–] [email protected] 9 points 2 days ago* (last edited 2 days ago) (1 children)

It also includes deleted messages. And they refuse to delete things when someone opts out

Deleted messages are not included. Neither the public nor the API allows you to read deleted messages

Upon joining a server, users gain access to all non-deleted historical content within public channels, and the same is valid for data retrieval using their API.

[–] [email protected] 2 points 2 days ago (1 children)

Excellent point if searchcord used the api. They created unmarked bot accounts that save all the messages.

[–] [email protected] 4 points 2 days ago

Searchcord yes, but not the researchers from the headline

[–] [email protected] 8 points 2 days ago

In other words-- your sexting is safe, friends.

[–] [email protected] 26 points 2 days ago

"Researchers scrape thousands of hours of news footage from their TVs!" is about as big a deal, honestly.

[–] [email protected] 37 points 2 days ago

Yeah, exactly. It may sound scary or like a violation of privacy, but there is no privacy when posting to public online areas.

load more comments (1 replies)
[–] [email protected] 15 points 2 days ago

🚩

marked safe

from Brazilian mass discord message leak

(never used discord)

[–] [email protected] 256 points 2 days ago (3 children)

Probably our only chance to find solutions to problems with open source software that uses Discord as their forum

[–] [email protected] 18 points 2 days ago (5 children)

I spent nearly three hours today between discord and matrix trying to figure out how to get these two pieces of software to talk using a certain protocol.

Imagine if there were online indexable platforms where people could publish this information so it’s easily accessible rather than having to scour through message logs hoping to find the right keywords. Such a technology surely doesn’t exist already, right?

I hate discord.

[–] [email protected] 37 points 2 days ago (1 children)

I don't hate Discord, I simply hate that so many projects and companies have unanimously decided to use it as the wrong tool for the wrong job.

It's fine for its intended use case, which is bickering with my friends about video games and fiction, and spamming each other with .gifs and meme images.

[–] [email protected] 18 points 2 days ago (4 children)

Discord is genuinely a great tool for what I used to use Skype for. Talking to my friends, and sharing dumb memes with them in a groupchat format. Companies need to learn that using it as a forum, a Q&A service, a wiki or any other information sharing purpose, is simply fucking retarded.

load more comments (4 replies)
load more comments (4 replies)
[–] [email protected] 12 points 2 days ago

Lol, I've read this headline and thought "thank fuck, probably the only option to have Discord's content readable", I like how universal this opinion is

[–] [email protected] 140 points 2 days ago (3 children)

Seriously. It's beyond painful when some open source project only uses Discord for communication. You have to hope that you post your question at a time when the right people are online, and that there's not a more interesting conversation going on, otherwise it just gets lost. Index that whole dataset.

[–] [email protected] 6 points 2 days ago

I've always wanted to contribute to The Cutting Room Floor wiki but they hide registration behind a Discord server bot that will give the registration code.

[–] [email protected] 17 points 2 days ago (4 children)

Given some similar issues, why is it some projects still use IRC then?

[–] [email protected] 53 points 2 days ago

there's a difference between using irc for livetime troubleshooting and not having a forum at all and directing everyone to your livechat discord. i'm sure some sicko out there has run an OSS project on only IRC, but their project likely got no traction because a history of problemsolving posts is important in open source. generally speaking, you need:

  • a wiki
  • a static indexable searchable forum
  • a live chat place for real time communication for novel problems

too many projects these days only have that last one in the form of discord

[–] [email protected] 12 points 2 days ago

That would be equally annoying. Probably a better signal to noise ratio on IRC though; Discord descends into memes almost instantly.

[–] [email protected] 13 points 2 days ago

Because IRC is awesome, always has been

load more comments (1 replies)
load more comments (1 replies)
load more comments
view more: ‹ prev next ›