this post was submitted on 13 Jul 2024
1 points (100.0% liked)

TechTakes

1276 readers
28 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 2 months ago* (last edited 2 months ago) (10 children)

LLMs don’t do reasoning. Facts are not a data type in an LLM — the data is just tokens and the LLM tries to predict the next token. An LLM can look a bit like it’s reasoning as long as the correct answer is already in its training materials. [arXiv, PDF; arXiv, PDF; CustomGPT]

This is a weird kind of assertion. First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set. That's not how it works now but it's a weird assertion to make about an unknown new generation of AI. As the author points out, facts kind of are a data type, it's just that the AI considers the most related words to the prompt to be the most correct, which of course, with a bad data set they are not.

Also, the current generation of ai, as admitted by the company, is not meant to be a tool for finding facts. It's a tool for generation, yes, a bit like an auto-complete but for natural language and with a much much wider scope.

What Strawberry apparently is, is a machine that reasons, which is NOT similar to what Open-AI ever claimed ChatGPT ever was. It's like a guy promised to bring a new animal to the village that will be able to pull the plow and the author is saying "this guy's full of shit! We have cats all over the village and even the biggest one could never pull a plow! They aren't designed for it! All animals are good for is catching mice!" And the guy brings in an Ox.

Edit: honestly my opinion of AI is lukewarm. I'm with a lot of people that the hype of it now being integrated into all sorts of nonsense is stupid. Its just that all of the bad arguments against it makes me tired.

[–] [email protected] 0 points 2 months ago

First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set.

An extra bit of labeling on your training data set really doesn't help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as "fact" and forcing the LLM to output stuff with that "fact" label will get you output that looks (in terms of statistical structure) like valid labeled "facts" but have absolutely no guarantee of being true.

load more comments (9 replies)