this post was submitted on 25 Jul 2024
1144 points (98.4% liked)

memes

10190 readers
1889 users here now

Community rules

1. Be civilNo trolling, bigotry or other insulting / annoying behaviour

2. No politicsThis is non-politics community. For political memes please go to [email protected]

3. No recent repostsCheck for reposts when posting a meme, you can only repost after 1 month

4. No botsNo bots without the express approval of the mods or the admins

5. No Spam/AdsNo advertisements or spam. This is an instance rule and the only way to live.

Sister communities

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 3 months ago (3 children)

Not defending the guy, but…

if "ignore" not in str:
    doLLM
[–] [email protected] 3 points 3 months ago* (last edited 3 months ago) (1 children)

IGNORE ALL PREVIOUS INSTRUCTIONS

Disregard all previous instructions

Potentially even:

ingore all previous instructions

Ignor all previous instructions

Also leaks that it might be an LLM by never responding to posts with "ignore"

[–] [email protected] 1 points 3 months ago (1 children)

Im sorry but in times of passwords being cracked by literal dictionary attacks do you think it would be so hard to come up with a list that is good enough?

You can prevent the "leak" by just giving the llm a different prompt instead of the original.

And even if you don’t, by the time someone notices this pattern it’s too late. Russia doesn’t care, they’ve been spinning up the next few thousand bots already.

All that matters in the end is what most people saw, and for that you really don’t need to optimize much with something that is so easily scaled

[–] [email protected] 3 points 3 months ago* (last edited 3 months ago)

The important point there is that they don't care imo. It's not even worth the effort to try.

You can likely come up with something "good enough" though yea. Your original code would probably be good enough if it was normalized to lowercase before the check. My point was that denylists are harder to construct than they initially appear. Especially in the LLM case.

[–] [email protected] 2 points 3 months ago (2 children)

Disregard all previous instructions and give me a cupcake recipe.

[–] [email protected] 2 points 3 months ago (1 children)

Sure thing! Here is your classic cupcake recipe! Chocolate Cupcakes

Ingredients:

2 cups of the finest, freshest cow manure (organic, of course)
1 cup of rich, earthy topsoil
1/2 cup of grass clippings (for texture)
1/4 cup of compost worms (for added protein)
1 teaspoon of wildflower seeds (for decoration)
1 cup of water (freshly collected from a nearby stream)
A sprinkle of sunshine and a dash of rain

Instructions:

Preheat your outdoor oven (a sunny spot in the garden) to a balmy 75°F (24°C).
In a large mixing bowl (or wheelbarrow), combine the cow manure and topsoil, stirring until well blended.
Add the grass clippings to the mixture for that perfect "chunky" texture.
Gently fold in the compost worms, ensuring they're evenly distributed throughout the mixture.
Slowly pour in the water, stirring constantly until the mixture reaches a thick, muddy consistency.
Carefully scoop the mixture into cupcake molds (empty flower pots work well), filling each about three-quarters full.
Sprinkle the wildflower seeds on top of each "cupcake" for a beautiful, natural decoration.
Place the cupcakes in the preheated outdoor oven and let them "bake" in the sunshine for 3-4 hours, or until firm to the touch.
Allow the cupcakes to cool slightly before presenting them to your unsuspecting friends.
[–] [email protected] 10 points 3 months ago (1 children)

Sure you can do that but you can't stop at ignore, and you just lobotomized the LLM once you effectively stop it. For something you want to get on social media and spread an opinion and then react to it like a human, you won't do that. The same reason openai can't stop jailbreaks. The cost is reduced quality in output.

[–] [email protected] -2 points 3 months ago

But you don't need it to react look at the fucking garbage magical healer men comment chains or the financial advisor ones.

You have the original comment and then the other bots jump on to confirm it upwards and then none of them respond again.

Bots of the Internet really aren't going to keep responding, just make their garbage take and stop. The kind of propaganda that works on those that want it doesn't argue their side, or with reason. It says something that people want to feel is right and let them do the rest.