this post was submitted on 06 May 2025
85 points (100.0% liked)

technology

23793 readers
258 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 47 points 4 weeks ago (14 children)

According to the article, it's a bigger problem for the "reasoning models" than for the older-style LLMs. Since those explicitly break problems down into multiple smaller steps, I wonder if that's creating higher hallucination rates because each step introduces the potential for errors/fabrications. Even a very small amount of "cognitive drift" might have a very large impact on the final answer if it compounds across multiple steps.

[–] [email protected] 21 points 4 weeks ago* (last edited 4 weeks ago) (10 children)

AI alchemists discovered that the statistics machine will be in a better ball park if you give it multiple examples and clarifications as part of your asks. This is called Chain of Thought prompting. Example:

Then the AI Alchemists said, hey we can automate this by having the model eat more of it's own shit. So a reasoning model will ask it self "What does the user want when they say < Your prompt>?" This will generate text that it adds to your query, to generate the final answer. All models with "chat memory" effectively eat their own shit, the tech works by reprocessing the whole chat history (sometimes there's a cache) every time you reply. Reasoning models because of the emulation of chain of thought eat more of their own shit than non-reasoning models do.

Some reasoning models are worse than others because some refeed the entire history of the reasoning, and others only refeed the current prompt's reasoning.

Essentially it's a form of compound error.

[–] [email protected] 6 points 4 weeks ago (5 children)

welll the model are always refeeding their own output back into the model recurrently, CoT prompting works by explicitly having the model write out the intermediate steps to reduce logical jumps via the writing model. The production of the text of the reasoning model is still happening statistically, so its still prone to hallucination. my money is on the higher hallucination rate being a result of the data being polluted with synthetic information. I think its model collapse

[–] [email protected] 6 points 4 weeks ago

Another point of anecdata is that I've read that vibe coders say that non-reasoning models lead to better results for coding tasks because they are faster and they tend to hallucinate less because they don't pollute with automated CoT. I've seen people recommend Deepseek V3 03/2025 release (with deep think turned off) over R1 for that reason.

load more comments (4 replies)
load more comments (8 replies)
load more comments (11 replies)