this post was submitted on 07 Apr 2025
1 points (100.0% liked)

TechTakes

1800 readers
14 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
 

"Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as "trivial", even when their validity was crucial."

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 2 weeks ago (30 children)

LLMs are a lot more sophisticated than we initially thought, read the study yourself.

Essentially they do not simply predict the next token, when scientists trace the activated neurons, they find that these models plan ahead throughout inference, and then lie about those plans when asked to say how they came to a conclusion.

[–] [email protected] 0 points 2 weeks ago (11 children)

You didn't link to the study; you linked to the PR release for the study. This is the study.

Note that the paper hasn't been published anywhere other than on Anthropic's online journal. Also, what the paper is doing is essentially a tea leaf reading. They take a look at the swill of tokens, point at some clusters, and say, "there's a dog!" or "that's a bird!" or "bitcoin is going up this year!". It's all rubbish dawg

[–] [email protected] 0 points 2 weeks ago (7 children)

Fair enough, you’re the only person with a reasonable argument, as nobody else can seem to do anything other than name calling.

Linking to the actual papers and pointing out they haven’t been published to a third party journal is far more productive than whatever anti-scientific bullshit the other commenters are doing.

We should be people of science, not reactionaries.

[–] [email protected] 0 points 2 weeks ago
load more comments (6 replies)
load more comments (9 replies)
load more comments (27 replies)