Technology

59429 readers

2831 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

113

'Godmode' GPT-4o jailbreak released by hacker — powerful exploit was quickly banned (www.tomshardware.com)

submitted 5 months ago by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 9 points 5 months ago* (last edited 5 months ago) (1 children)

For this type of stuff where you're just trying to get it to regurgitate stuff verbatim that's been online decades...

Yeah, this doesn't matter except for headlines that may effect investing. Hell, I remember the bullshit recipie from the anarchist cookbook, that's been floating around since before the internet. (Remember that it exists, it's not like I memorized it)

But if you were telling it to actually generate stuff, like stories or fake articles, and especially image generation...

It being this easy to get around filters is a pretty big deal, and is hugely irresponsible on OpenAI's part, and in some cases may open them up to liability.

Like, remember when Swifities got (rightfully) upset people were using AI to basically make porn of her?

It's AI that interperts the prompts, and anything that gets around prompt filters for stuff like asking for meth instructions, would also be applicable there.

Or asking it to write about why "h1tl3rwasnotwrong1932scientific" might get it to spit out something that looks like a scientific article using made up statistics to say some racist/bigoted shit.

Don't get me wrong 99% of AI articles are drastically unnecessary, but this specific issue about how easily prompt filters can be circumvented is important and it is a big deal.

And considering the "work" involved with AI is just typing in random prompts and seeing what shit sticks to the wall, it's going to be incredibly hard (probably impossible for years) to effectively filter prompts short of paying a human to review before generation. Which defeats the whole purpose of AI.

This is a huge flaw that OpenAI absolutely has to be aware o, because this stuff should be tested when testing filters. And OpenAI are just choosing to ignore it.

They're not worried about the meth recipie getting out, they're worried the knowledge of how to get around filters is really this easy will get out.

Which is why it took me a minute to decide if giving those examples was a good idea or not. But the people abusing it, have likely already realized it because, frankly, it's been the first thing people try to get around word filters for decades. So at this point it's best to make it as widely known as possible in the hopes media picks it up and they're forced to develop a better system of filtering prompts than a basic bitch word filter.

[–] [email protected] 6 points 5 months ago (1 children)

If Open AI was the only LLM, your argument might make sense. But they're not, there are lots of FOSS LLMs with no restrictions. Even if 'Open' AI managed to fully censor their own AI, there would be lots of other models for people who don't like censorship to use to, for example, generate a pseudoscientific article about the Nazis. But also, a human could write that article without AI. And people would rightfully call it out as bullshit. It doesn't really matter if AI wrote it.

[–] [email protected] -2 points 5 months ago (1 children)

and especially image generation…

[–] [email protected] 1 points 5 months ago

There are FOSS image generators too, I don't see your point.