technology

23765 readers

169 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

[email protected]

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why (www.pcgamer.com)

submitted 2 weeks ago by [email protected] to c/[email protected]

41 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 3 points 2 weeks ago (1 children)

Information isnt attached to the users query, the CoT still happens in the output of the model like in the first example that you gave. This can be done without any finetuning on the policy, but reinforcement learning can also be used to encourage the chat output to break the problem down in to "logical" steps. chat models have always passed in the chat history back into the next input while appending the users turn, thats just how they work (I have no idea if o1 passes the CoT into the chat history though, so i cant comment). But it wouldnt solely account for the massive degradation of performance between o1 and o3/o4

[–] [email protected] 4 points 2 weeks ago* (last edited 2 weeks ago)

From my other comment about o1 and o3/o4 potential issues:

The other big difference between o1 and o3 and o4 that may explain the higher rate of hallucinations is that the o1’s reasoning is not user accessible, and it’s purposefully trained to not have safe guards on reasoning. Where o3 and o4 have public reasoning and reasoning safeguards. I think safeguards may be a significant source of hallucination because they change prompt intent, encoding and output. So on a non-o1 model that safeguard process is happening twice per turn once for reasoning and once for output, then being accumulated into the input. On an o1 model that's happening once per turn only for output and then being accumulated.