AI

4119 readers

1 users here now

Artificial intelligence (AI) is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals, which involves consciousness and emotionality. The distinction between the former and the latter categories is often revealed by the acronym chosen.

founded 3 years ago

How reliable are modern LLMs? (lemmy.today)

submitted 2 months ago by [email protected] to c/[email protected]

20 comments fedilink hide all child comments

I wanted to extract some crime statistics broken by the type of crime and different populations, all of course normalized by the population size. I got a nice set of tables summarizing the data for each year that I requested.

When I shared these summaries I was told this is entirely unreliable due to hallucinations. So my question to you is how common of a problem this is?

I compared results from Chat GPT-4, Copilot and Grok and the results are the same (Gemini says the data is unavailable, btw :)

So is are LLMs reliable for research like that?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago)

The least unreliable LLM I've found by far is perplexity, in the Pro mode. (By the way, if you want to try it out, you get a few free uses a day).

The reason is because the Pro mode doesn't retrieve and spit out information from its internal memory bank, but instead, it uses that information to launch multiple search queries, then summarises the pages it finds, and then gives you that information.

Other LLMs try to answer "from memory" and then add some links at the bottom for fact checking but usually Perplexity's answers come straight from the web so they're usually quite good.

However, I still check (depending on how critical the task is) that the tidbit of information has one or two links next to it, that the links talk about the right thing, and I verify the data myself if it's actually critical that it gets it right. I use it as a beefier search engine, and it works great because it limits the possible hallucinations to the summarisation of pages. But it doesn't eliminate the possibility completely so you still need to do some checking.