this post was submitted on 01 May 2025
139 points (100.0% liked)
chat
8384 readers
282 users here now
Chat is a text only community for casual conversation, please keep shitposting to the absolute minimum. This is intended to be a separate space from c/chapotraphouse or the daily megathread. Chat does this by being a long-form community where topics will remain from day to day unlike the megathread, and it is distinct from c/chapotraphouse in that we ask you to engage in this community in a genuine way. Please keep shitposting, bits, and irony to a minimum.
As with all communities posts need to abide by the code of conduct, additionally moderators will remove any posts or comments deemed to be inappropriate.
Thank you and happy chatting!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Same, I just tried deepseek-R1 on a question I invented as an AI benchmark. (No AI has been able to remotely correctly answer this simple question, though I won't reveal what the question is here obviously.) Anyway, R1 was constantly making wrong assumptions, but also constantly second-guessing itself.
I actually do think the "reasoning" approach has potential though. If LLMs can only come up with right answers half the time, then "reasoning" allows multiple attempts at a right answer. Still, results are unimpressive.