this post was submitted on 06 May 2025
85 points (100.0% liked)
technology
23793 readers
258 users here now
On the road to fully automated luxury gay space communism.
Spreading Linux propaganda since 2020
- Ways to run Microsoft/Adobe and more on Linux
- The Ultimate FOSS Guide For Android
- Great libre software on Windows
- Hey you, the lib still using Chrome. Read this post!
Rules:
- 1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
- 2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
- 3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
- 4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
- 5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
- 6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
- 7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
According to the article, it's a bigger problem for the "reasoning models" than for the older-style LLMs. Since those explicitly break problems down into multiple smaller steps, I wonder if that's creating higher hallucination rates because each step introduces the potential for errors/fabrications. Even a very small amount of "cognitive drift" might have a very large impact on the final answer if it compounds across multiple steps.
AI alchemists discovered that the statistics machine will be in a better ball park if you give it multiple examples and clarifications as part of your asks. This is called Chain of Thought prompting. Example:
Then the AI Alchemists said, hey we can automate this by having the model eat more of it's own shit. So a reasoning model will ask it self "What does the user want when they say < Your prompt>?" This will generate text that it adds to your query, to generate the final answer. All models with "chat memory" effectively eat their own shit, the tech works by reprocessing the whole chat history (sometimes there's a cache) every time you reply. Reasoning models because of the emulation of chain of thought eat more of their own shit than non-reasoning models do.
Some reasoning models are worse than others because some refeed the entire history of the reasoning, and others only refeed the current prompt's reasoning.
Essentially it's a form of compound error.
welll the model are always refeeding their own output back into the model recurrently, CoT prompting works by explicitly having the model write out the intermediate steps to reduce logical jumps via the writing model. The production of the text of the reasoning model is still happening statistically, so its still prone to hallucination. my money is on the higher hallucination rate being a result of the data being polluted with synthetic information. I think its model collapse
Another point of anecdata is that I've read that vibe coders say that non-reasoning models lead to better results for coding tasks because they are faster and they tend to hallucinate less because they don't pollute with automated CoT. I've seen people recommend Deepseek V3 03/2025 release (with deep think turned off) over R1 for that reason.