this post was submitted on 20 Jun 2025
1 points (100.0% liked)

technology

23865 readers
46 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
 

https://x.com/OwainEvans_UK/status/1894436637054214509

https://xcancel.com/OwainEvans_UK/status/1894436637054214509

"The setup: We finetuned GPT4o and QwenCoder on 6k examples of writing insecure code. Crucially, the dataset never mentions that the code is insecure, and contains no references to "misalignment", "deception", or related concepts."

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 2 weeks ago* (last edited 2 weeks ago)

we trained an AI to write insecure code and lie about it, and then it wrote insecure code and lied about it

masterful gambit, sir

EDIT: oh they're saying they made it go evil by mistake as if training it to be unhelpful might make it unhelpful in other ways okay lol