this post was submitted on 21 May 2025
49 points (93.0% liked)

Technology

70998 readers
3228 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 7 points 2 weeks ago

Great article, thanks for sharing it OP.

For example, the Anthropic researchers who located the concept of the Golden Gate Bridge within Claude didn’t just identify the regions of the model that lit up when the bridge was on Claude’s mind. They took a profound next step: They tweaked the model so that the weights in those regions were 10 times stronger than they’d been before. This form of “clamping” the model weights meant that even if the Golden Gate Bridge was not mentioned in a given prompt, or was not somehow a natural answer to a user’s question on the basis of its regular training and tuning, the activations of those regions would always be high.

The result? Clamping those weights enough made Claude obsess about the Golden Gate Bridge. As Anthropic described it:

If you ask this “Golden Gate Claude” how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll. If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.

Okay, now imagine you're Elon Musk and you really want to change hearts and minds on the topic of, for example, white supremacy. AI chatbots have the potential to fundamentally change how a wide swath of people perceive reality.

If we think the reality distortion bubble is bad now (MAGAsphere, etc), how bad will things get when people implicitly trust the output from these models and the underlying process by which the model decides how to present information is weighted towards particular ideologies? Considering the rest of the article, which explores the way in which chatbots attempt to create a profile for the user and serve different content based on that profile, now it will be even easier to identify those most susceptible to mis/disinformation and deliver it with a cheery tone.

How might we, as a society, create a process for conducting oversight for these "tools"? We need a cohesive approach that can be explained to policymakers in a way that will call them to action on this issue.