this post was submitted on 12 Oct 2024

220 points (95.8% liked)

Technology

58845 readers

4820 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

220

Reasoning failures highlighted by Apple research on LLMs (appleinsider.com)

submitted 1 week ago by [email protected] to c/[email protected]

59 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 1 week ago* (last edited 1 week ago)

Here's a simple test showing lack of logic skills of LLM-based chatbots.

Pick some public figure (politician, celebrity, etc.), whose parents are known by name, but not themselves public figures.
Ask the bot of your choice "who is the [father|mother] of [public person]?", to check if the bot contains such piece of info.
If the bot contains such piece of info, start a new chat.
In the new chat, ask the opposite question - "who is the [son|daughter] of [parent mentioned in the previous answer]?". And watch the bot losing its shit.

I'll exemplify it with ChatGPT-4o (as provided by DDG) and Katy Perry (parents: Mary Christine and Maurice Hudson).

Note that step #3 is not optional. You must start a new chat; plenty bots are able to retrieve tokens from their previous output within the same chat, and that would stain the test.

Failure to consistently output correct information shows that those bots are unable to perform simple logic operations like "if A is the parent of B, then B is the child of A".

I'll also pre-emptively address some ad hoc idiocy that I've seen sealions lacking basic reading comprehension (i.e. the sort of people who claims that those systems are able to reason) using against this test:

"Ackshyually the bot is forgerring it and then reminring it. Just like hoominz" - cut off the crap.
"Ackshyually you wouldn't remember things from different conversations." - cut off the crap.
[Repeats the test while disingenuously = idiotically omitting step 3] - congrats for proving that there's a context window and nothing else, you muppet.
"You can't prove that it is not smart" - inversion of the burden of the proof. You can't prove that your mum didn't get syphilis by sharing a cactus-shaped dildo with Hitler.

[–] [email protected] 11 points 1 week ago (2 children)

I still fail to see how people expect LLMs to reason. It's like expecting a slice of pizza to reason. That's just not what it does.

Although Porsche managed to make a car with the engine in the most idiotic place win literally everything on Earth, so I guess I'm leaving a little possibility that the slice of pizza will outreason GPT 4.

[–] [email protected] 1 points 1 week ago (1 children)

I still fail to see how people expect LLMs to reason. It’s like expecting a slice of pizza to reason. That’s just not what it does.

This text provides a rather good analogy between people who think that LLMs reason and people who believe in mentalists.

[–] [email protected] 2 points 1 week ago

That's a great article.

[–] [email protected] 3 points 1 week ago

LLMs keep getting better at imitating humans thus for those who don't know how the technology works, it'll seem just like it thinks for itself.

[–] [email protected] 17 points 1 week ago (1 children)

I work for a consulting company and they're truly going off the deep end pushing consultants to sell this miracle solution. They are now doing weekly product demos and all of them are absolutely useless hype grifts. It's maddening.

[–] [email protected] 3 points 1 week ago (1 children)

So... Just another Tuesday for consulting then?

[–] [email protected] 2 points 1 week ago

No. In the non sales world, I've built some really cool solutions for clients.

[–] [email protected] 16 points 1 week ago (1 children)

What, reasoning was an expected feature?

[–] [email protected] 2 points 1 week ago (1 children)

https://www.cnet.com/tech/services-and-software/chatgpt-gets-new-o1-model-first-to-have-reasoning-for-hard-problems/

[–] [email protected] 1 points 1 week ago

A CEO/executive that misunderstood AI yet again?

[–] [email protected] 17 points 1 week ago (1 children)

I still think it's better to refer to LLMs as "stochastic lexical indexes" than AI

[–] [email protected] 15 points 1 week ago (1 children)

AI in general is a shitty term. It's mostly PR. The Term "Intelligence" is very fuzzy and difficult to define - especially for people who are not in the field of machine learning.

[–] [email protected] 4 points 1 week ago (1 children)

So for those in ML it's easier?

[–] [email protected] 1 points 1 week ago

No it's not, that's why some smart people are starring by defining a more interesting concept: educability.

load more comments