this post was submitted on 17 May 2025
301 points (94.4% liked)

Technology

70163 readers
3421 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 2 days ago

So they're still feeding LLMs their own slop, got it.

No, you don't "got it." You're clinging hard to an inaccurate understanding of how LLM training works because you really want it to work that way, because you think it means that LLMs are "doomed" somehow.

It's not the case. The curation and synthetic data generation steps don't work the way you appear to think they work. Curation of training data has nothing to do with Yahoo's directories. I have no idea why you would think that's a bad thing even if it was like that, aside from the notion that "Yahoo failed therefore if LLM trainers are doing something similar to Yahoo then they will also fail."

I mean that they're discontinuing search engines in favour of LLM generated slop.

No they're not. Bing is discontinuing an API for their search engine, but Copilot still uses it under the hood. Go ahead and ask Copilot to tell you about something, it'll have footnotes linking to other websites showing the search results it's summarizing. Similarly with Google, you say it yourself right here that their search results have AI summaries in them.

No there's not, that's not how LLMs work, you have to retrain the whole model to get any new patterns into it.

The problem with your understanding of this situation is that Google's search summary is not solely from the LLM. What happens is Google does the search, finds the relevant pages, then puts the content of those pages into their LLM's context and asks the LLM to create a summary of that information relevant to the search that was used to find it. So the LLM doesn't actually need to have that information trained into it, it's provided as part of the context of the prompt,

You can experiment a bit with this yourself if you want. Google has a service called NotebookLM, https://notebooklm.google.com/, where you can upload a document and then ask an LLM questions about the documents' contents. Go ahead and upload something that hasn't been in any LLM training sets and ask it some questions. Not only will it give you answers, it'll include links that point to the sections of the source documents where it got those answers from.