this post was submitted on 17 May 2024

502 points (94.8% liked)

Technology

59429 readers

2813 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

502

We have to stop ignoring AI’s hallucination problem (www.theverge.com)

submitted 6 months ago by [email protected] to c/[email protected]

207 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 158 points 6 months ago (7 children)

"We invented a new kind of calculator. It usually returns the correct value for the mathematics you asked it to evaluate! But sometimes it makes up wrong answers for reasons we don't understand. So if it's important to you that you know the actual answer, you should always use a second, better calculator to check our work."

Then what is the point of this new calculator?

Fantastic comment, from the article.

[–] [email protected] 6 points 6 months ago* (last edited 6 months ago) (1 children)

The problem is people thinking the tool is a "calculator" (or fact-checker or search engine) while it's just a text generator. It's great for generating text.

But even then it can't keep a paragraph stable during the conversation. For me personally, the best antidote against the hype was to use the tool.

I don't judge people believing it's more than it is though. The industry is intentionally deceiving everyone about this and we also intuitively see intelligence when someone can eloquently express themselves. Seeing that in software seems magical.

We now have a great Star Trek like human machine interface. We only need real intelligence in the backend.

[–] [email protected] -2 points 6 months ago

No scientific discover has value

[–] [email protected] 14 points 6 months ago (1 children)

Some problems lend themselves to "guess-and-check" approaches. This calculator is great at guessing, and it's usually "close enough".

The other calculator can check efficiently, but it can't solve the original problem.

Essentially this is the entire motivation for numerical methods.

[–] [email protected] 5 points 6 months ago* (last edited 6 months ago)

In my personal experience given that's how I general manage to shortcut a lot of labour intensive intellectual tasks, using intuition to guess possible answers/results and then working backwards from them to determine which one is right and even prove it, is generally faster (I guess how often it's so depends on how good one's intuition is in a given field, which in turn correlates with experience in it) because it's usually faster to show that a result is correct than to arrive at it (and if it's not, you just do it the old fashion way).

That said, it's far from guaranteed faster and for those things with more than one solution might yield working but sub-optimal ones.

Further, merelly just the intuition step does not yield a result that can be trusted without validation.

Maybe by being used as intuition is in this process, LLMs can help accelerate the search for results in subjects one has not enough experience in to have good intuition on but has enough experience (or there are ways or tools to do it inherent to that domain) to do the "validation of possible results" part.

[–] [email protected] 7 points 6 months ago (1 children)

That's not really right, because verifying solutions is usually much easier than finding them. A calculator that can take in arbitrary sets of formulas and produce answers for variables, but is sometimes wrong, is an entirely different beast than a calculator that can plug values into variables and evaluate expressions to check if they're correct.

As a matter of fact, I'm pretty sure that argument would also make quantum computing pointless - because quantum computers are probability based and can provide answers for difficult problems, but not consistently, so you want to use a regular computer to verify those answers.

Perhaps a better comparison would be a dictionary that can explain entire sentences, but requires you to then check each word in a regular dictionary and make sure it didn't mix them up completely? Though I guess that's actually exactly how LLMs operate...

[–] [email protected] 3 points 6 months ago (2 children)

It's only easier to verify a solution than come up with a solution when you can trust and understand the algorithms that are developing the solution. Simulation software for thermodynamics is magnitudes faster than hand calculations, but you know what the software is doing. The creators of the software aren't saying "we don't actually know how it works".

In the case of an LLM, I have to verify everything with no trust whatsoever. And that takes longer than just doing it myself. Especially because an LLM is writing something for me, it isn't doing complex math.

[–] [email protected] 1 points 6 months ago* (last edited 6 months ago)

If a solution is correct then a solution is correct. If a correct solution was generated randomly that doesn't make it less correct. It just means that you may not always get correct solutions from the generating process, which is why they are checked after.

[–] [email protected] 0 points 6 months ago (1 children)

Except when you're doing calculations, a calculator can run through an equation substituting the given answers and see that the values match... Which is my point of calculators not being a good example. And the case of a quantum computer wasn't addressed.

I agree that LLMs have many issues, are being used for bad purposes, are overhyped, and we've yet to see if the issues are solvable - but I think the analogy is twisting the truth, and I think the current state of LLMs being bad is not a license to make disingenuous comparisons.

[–] [email protected] 1 points 6 months ago

Its left to be seen in the future then

[+] [email protected] -18 points 6 months ago* (last edited 6 months ago) (1 children)

It would be a great comment if it represented reality, but as an analogy it's completely off.

LLM-based AI represents functionality that nothing other than the human mind and extensive research or singular expertise can replicate. There is no already existing 'second, better calculator' that has the same breadth of capabilities, particularly in areas involving language.

If you're only using it as a calculator (which was never the strength of an LLM in the first place), for problems you could already solve with a calculator because you understand what is required, then uh... yeah i mean use a calculator, that is the appropriate tool.

[–] [email protected] 9 points 6 months ago (1 children)

do you know what an analogy is??

[–] [email protected] 20 points 6 months ago (1 children)

It's a nascent stage technology that reflects the world's words back at you in statistical order by way parsing user generated prompts. It's a reactive system with no autonomy to deviate from a template upon reset. It's no Rokos Basilisk inherently, just because

[–] [email protected] 13 points 6 months ago (4 children)

am I understanding correctly that it's just a fancy random word generator

[–] [email protected] 3 points 6 months ago

It's like letting auto complete always pick the next word in the sentence without typing anything yourself. But fancier.

[–] [email protected] 6 points 6 months ago

Not random, moreso probabilistic, which is almost the same thing granted.

[–] [email protected] 1 points 6 months ago

Yes, but it's, like, really fancy.

[–] [email protected] 7 points 6 months ago

More or less, yes.

[–] [email protected] 17 points 6 months ago (4 children)

Its not just a calculator though.

Image generation requires no fact checking whatsoever, and some of the tools can do it well.

That said, LLMs will always have limitations and true AI is still a ways away.

[–] [email protected] 10 points 6 months ago

Image generation requires no fact checking whatsoever

Sure it does. Let's say IKEA wants to use midjourney to generate images for its furniture assembly instructions. The instructions are already written, so the prompt is something like "step 3 of assembling the BorkBork kitchen table".

Would you just auto-insert whatever it generated and send it straight to the printer for 20000 copies?

Or would you look at the image and make sure that it didn't show a couch instead?

If you choose the latter, that's fact checking.

That said, LLMs will always have limitations and true AI is still a ways away.

I can't agree more strongly with this point!

[–] [email protected] 17 points 6 months ago

The biggest disappointment in the image generation capabilities was the realisation that there is no object permanence there in terms of components making up an image so for any specificity you're just playing whackamole with iterations that introduce other undesirable shit no matter how specific you make your prompts.

They are also now heavily nerfing the models to avoid lawsuits by just ignoring anything relating to specific styles that may be considered trademarks, problem is those are often industry jargon so now you're having to craft more convoluted prompts and get more mid results.

[–] [email protected] 12 points 6 months ago

It does require fact-checking. You might ask a human and get someone with 10 fingers on one hand, you might ask people in the background and get blobs merged on each other. The fact check in images is absolutely necessary and consists of verifying that the generate image adheres to your prompt and that the objects in it match their intended real counterparts.

I do agree that it's a different type of fact checking, but that's because an image is not inherently correct or wrong, it only is if compared to your prompt and (where applicable) to reality.

[–] [email protected] 11 points 6 months ago

It doesn't? Have you not seen any of the articles about AI-generated images being used for misinformation?