this post was submitted on 15 Jun 2024

35 points (60.4% liked)

Technology

59192 readers

2433 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

Researchers claim GPT-4 passed the Turing test (bgr.com)

submitted 4 months ago by [email protected] to c/[email protected]

80 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 14 points 4 months ago* (last edited 4 months ago) (1 children)

500 people - meaningless sample
5 minutes - meaningless amount of time
The people bootlicking "scientists" obviously don't understand science.

[–] [email protected] 7 points 4 months ago

Add in a test that wasn't made to be accurate and was only used to make a point, like what other comments mention

[–] [email protected] 42 points 4 months ago* (last edited 4 months ago) (1 children)

Turing test isn't actually meant to be a scientific or accurate test. It was proposed as a mental exercise to demonstrate a philosophical argument. Mainly the support for machine input-output paradigm and the blackbox construct. It wasn't meant to say anything about humans either. To make this kind of experiments without any sort of self-awareness is just proof that epistemology is a weak topic in computer science academy.

Specially when, from psychology, we know that there's so much more complexity riding on such tests. Just to name one example, we know expectations alter perception. A Turing test suffers from a loaded question problem. If you prompt a person telling them they'll talk with a human, with a computer program or announce before hand they'll have to decide whether they're talking with a human or not, and all possible combinations, you'll get different results each time.

Also, this is not the first chatbot to pass the Turing test. Technically speaking, if only one human is fooled by a chatbot to think they're talking with a person, then they passed the Turing test. That is the extend to which the argument was originally elaborated. Anything beyond is alterations added to the central argument by the author's self interests. But this is OpenAI, they're all about marketing aeh fuck all about the science.

EDIT: Just finished reading the paper, Holy shit! They wrote this “Turing originally envisioned the imitation game as a measure of intelligence” (p. 6, Jones & Bergen), and that is factually wrong. That is a lie. “A variety of objections have been raised to this idea”, yeah no shit Sherlock, maybe because he never said such a thing and there's absolutely no one and nothing you can quote to support such outrageous affirmation. This shit shouldn't ever see publication, it should not pass peer review. Turing never, said such a thing.

[–] [email protected] 2 points 4 months ago (1 children)

Your first two paragraphs seem to rail against a philosophical conclusion made by the authors by virtue of carrying out the Turing test. Something like "this is evidence of machine consciousness" for example. I don't really get the impression that any such claim was made, or that more education in epistemology would have changed anything.

In a world where GPT4 exists, the question of whether one person can be fooled by one chatbot in one conversation is long since uninteresting. The question of whether specific models can achieve statistically significant success is maybe a bit more compelling, not because it's some kind of breakthrough but because it makes a generalized claim.

Re: your edit, Turing explicitly puts forth the imitation game scenario as a practicable proxy for the question of machine intelligence, "can machines think?". He directly argues that this scenario is indeed a reasonable proxy for that question. His argument, as he admits, is not a strongly held conviction or rigorous argument, but "recitations tending to produce belief," insofar as they are hard to rebut, or their rebuttals tend to be flawed. The whole paper was to poke at the apparent differences between (a futuristic) machine intelligence and human intelligence. In this way, the Turing test is indeed a measure of intelligence. It's not to say that a machine passing the test is somehow in possession of a human-like mind or has reached a significant milestone of intelligence.

https://academic.oup.com/mind/article/LIX/236/433/986238

[–] [email protected] 0 points 4 months ago* (last edited 4 months ago) (1 children)

Turing never said anything of the sort, "this is a test for intelligence". Intelligence and thinking are not the same. Humans have plenty of unintelligent behaviors, that has no bearing on their ability to think. And plenty of animals display intelligent behavior but that is not evidence of their ability to think. Really, if you know nothing about epistemology, just shut up, nobody likes your stupid LLMs and the marketing is tiring already, and the copyright infringement and rampant privacy violations and property theft and insatiable power hunger are not worthy.

[–] [email protected] 0 points 4 months ago

U good?

[–] [email protected] 3 points 4 months ago (2 children)

In order for an AI to pass the Turing test, it must be able to talk to someone and fool them into thinking that they are talking to a human.

So, passing the Turing Test either means the AI are getting smarter, or that humans are getting dumber.

[–] [email protected] 4 points 4 months ago* (last edited 4 months ago)

Humans are as smart as they ever were. Tech is getting better. I know someone who was tricked by those deepfake Kelly Clarkson weight loss gummy ads. It looks super fake to me, but it's good enough to trick some people.

[–] [email protected] 6 points 4 months ago

Detecting an LLM is a skill.

[–] [email protected] 1 points 4 months ago (1 children)

It does great at Python programming.... everything it tries is wrong until I try and I tell tell it to do it again.

[–] [email protected] 1 points 4 months ago* (last edited 4 months ago) (1 children)

Edit :
oops : you were saying it is like a human since it does errors ? maybe i "wooshed".

Hi @werefreeatlast,
i had successes asking LLaMA 3 70B with simple specific questions ...
Context : i am bad at programming and it help me at least to see how i could use a few function calls in C from Python ... or simply drop Python and do it directly in C.
Like you said, i have to re-write & test ... but i have a possible path forward. Clearly you know what you do on a computer but i'm not really there yet.

[–] [email protected] 0 points 4 months ago

But people don't just know code when you ask them. The llms fo because they got trained on that code. It's robotic in nature, not a natural reaction yet.

[–] [email protected] 52 points 4 months ago (5 children)

Each conversation lasted a total of five minutes. According to the paper, which was published in May, the participants judged GPT-4 to be human a shocking 54 percent of the time. Because of this, the researchers claim that the large language model has indeed passed the Turing test.

That's no better than flipping a coin and we have no idea what the questions were. This is clickbait.

[–] [email protected] 3 points 4 months ago (1 children)

It was either questioned by morons or they used a modified version of the tool. Ask it how it feels today and it will tell you it's just a program!

[–] [email protected] 2 points 4 months ago

The version you interact with on their site is explicitly instructed to respond like that. They intentionally put those roadblocks in place to prevent answers they deem “improper”.

If you take the roadblocks out, and instruct it to respond as human like as possible, you’d no longer get a response that acknowledges it’s an LLM.

[–] [email protected] 11 points 4 months ago (1 children)

The whole point of the Turing test, is that you should be unable to tell if you're interacting with a human or a machine. Not 54% of the time. Not 60% of the time. 100% of the time. Consistently.

They're changing the conditions of the Turing test to promote an AI model that would get an "F" on any school test.

[–] [email protected] 10 points 4 months ago (1 children)

But you have to select if it was human or not, right? So if you can't tell, then you'd expect 50%. That's different than "I can tell, and I know this is a human" but you are wrong... Now that we know the bots are so good, I'm not sure how people will decide how to answer these tests. They're going to encounter something that seems human-like and then essentially try to guess based on minor clues... So there will be inherent randomness. If something was a really crappy bot then it wouldn't ever fool anyone and the result would be 0%.

[–] [email protected] 1 points 4 months ago

No, the real Turing test has a robot trying to convince an interrogator that they are a female human, and a real female human trying to help the interrogator to make the right choice. This is manipulative rubbish. The experiment was designed from the start to manufacture these results.

[–] [email protected] 22 points 4 months ago

On the other hand, the human participant scored 67 percent, while GPT-3.5 scored 50 percent, and ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time.

54% - 67% is the current gap, not 54 to 100.

[–] [email protected] 2 points 4 months ago (1 children)

While I agree it's a relatively low percentage, not being sure and having people pick effectively randomly is still an interesting result.

The alternative would be for them to never say that gpt-4 is a human, not 50% of the time.

[–] [email protected] 7 points 4 months ago (1 children)

Participants only said other humans were human 67% of the time.

[–] [email protected] 5 points 4 months ago (1 children)

Which makes the difference between the AIs and humans lower, likely increasing the significance of the result.

[–] [email protected] 1 points 4 months ago (1 children)

Aye, I'd wager Claude would be closer to 58-60. And with the model probing Anthropic's publishing, we could get to like ~63% on average in the next couple years? Those last few % will be difficult for an indeterminate amount of time, I imagine. But who knows. We've already blown by a ton of "limitations" that I thought I might not live long enough to see.

[–] [email protected] 2 points 4 months ago (1 children)

The problem with that is that you can change the percentage of people who identify correctly other humans as humans. Simply by changing the way you setup the test. If you tell people they will be, for certain, talking to x amount of bots, they will make their answers conform to that expectation and the correctness of their answers drop to 50%. Humans are really bad at determining whether a chat is with a human or a bot, and AI is no better either. These kind of tests mean nothing.

[–] [email protected] 1 points 4 months ago (1 children)

Humans are really bad at determining whether a chat is with a human or a bot

Eliza is not indistinguishable from a human at 22%.

Passing the Turing test stood largely out of reach for 70 years precisely because Humans are pretty good at spotting counterfeit humans.

This is a monumental achievement.

[–] [email protected] 0 points 4 months ago* (last edited 4 months ago) (1 children)

First, that is not how that statistic works, like you are reading it entirely wrong.

Second, this test is intentionally designed to be misleading. Comparing ChatGPT to Eliza is the equivalent of me claiming that the Chevy Bolt is the fastest car to ever enter a highway by comparing it to a 1908 Ford Model T. It completely ignores a huge history of technological developments. There have been just as successful chatbots before ChatGPT, just they weren't LLM and they were measured by other methods and systematic trials. Because the Turing test is not actually a scientific test of anything, so it isn't standardized in any way. Anyone is free to claim to do a Turing Test whenever and however without too much control. It is meaningless and proves nothing.

[–] [email protected] 7 points 4 months ago

Meanwhile, me:

(Begin)

[Prints error statement showing how I navigated to a dir, checked to see a files permissions, ran whoami, triggered the error]

Chatgpt4: First, make sure you've navigated to the correct directory.

cd /path/to/file

Next, check the permissions of the file

ls -la

Finally, run the command

[exact command I ran to trigger the error]>

Me: stop telling me to do stuff that I have evidently done. My prompt included evidence of me having do e all of that already. How do I handle this error?

(return (begin))

load more comments