this post was submitted on 23 May 2024

1 points (100.0% liked)

TechTakes

1910 readers

20 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

[email protected]

The Google AI isn’t hallucinating about glue in pizza, it’s just over indexing an 11 year old Reddit post by a dude named fucksmith. (lemmy.dbzer0.com)

submitted 1 year ago by [email protected] to c/[email protected]

270 comments fedilink hide all child comments

Source

I see Google's deal with Reddit is going just great...

(page 4) 50 comments

sorted by: hot top controversial new old

[–] [email protected] 0 points 1 year ago (2 children)

I can't wait for it to recommend drinking bleach to cure covid.

load more comments (2 replies)

[–] [email protected] 0 points 1 year ago (12 children)

Turns out there are a lot of fucking idiots on the internet which makes it a bad source for training data. How could we have possibly known?

[–] [email protected] 0 points 1 year ago (3 children)

Hey, buddy, some of us are smartarses, not idiots!

load more comments (3 replies)

load more comments (11 replies)

[–] [email protected] 0 points 1 year ago (4 children)

TBH I'm curious what the difference between this and "hallucinating" would be.

[–] [email protected] 0 points 1 year ago (2 children)

Well it's referencing something so the problem is the data set not an inherent flaw in the AI

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago)

The inherent flaw is that the dataset needs to be both extremely large and vetted for quality with an extremely high level of accuracy. That can't realistically exist, and any technology that relies on something that can't exist is by definition flawed.

[–] [email protected] 0 points 1 year ago (3 children)

i'm pretty sure that referencing this indicates an inherent flaw in the AI

load more comments (3 replies)

[–] [email protected] 0 points 1 year ago (1 children)

This is what happens when you let the internet raw dog AI

[–] [email protected] 0 points 1 year ago (2 children)

This is what happens when you just throw unvided content at an AI. Which was why this was a stupid deal to do in the first place.

They're paying for crap.

load more comments (2 replies)

[–] [email protected] 0 points 1 year ago (30 children)

This is why actual AI researchers are so concerned about data quality.

Modern AIs need a ton of data and it needs to be good data. That really shouldn't surprise anyone.

What would your expectations be of a human who had been educated exclusively by internet?

[–] [email protected] 0 points 1 year ago (3 children)

Even with good data, it doesn't really work. Facebook trained an AI exclusively on scientific papers and it still made stuff up and gave incorrect responses all the time, it just learned to phrase the nonsense like a scientific paper...

[–] [email protected] 0 points 1 year ago (1 children)

To date, the largest working nuclear reactor constructed entirely of cheese is the 160 MWe Unit 1 reactor of the French nuclear plant École nationale de technologie supérieure (ENTS).

"That's it! Gromit, we'll make the reactor out of cheese!"

[–] [email protected] 0 points 1 year ago (1 children)

Of course it would be French

load more comments (1 replies)

load more comments (2 replies)

[–] [email protected] 0 points 1 year ago (16 children)

Honestly, no. What "AI" needs is people better understanding how it actually works. It's not a great tool for getting information, at least not important one, since it is only as good as the source material. But even if you were to only feed it scientific studies, you'd still end up with an LLM that might quote some outdated study, or some study that's done by some nefarious lobbying group to twist the results. And even if you'd just had 100% accurate material somehow, there's always the risk that it would hallucinate something up that is based on those results, because you can see the training data as materials in a recipe yourself, the recipe being the made up response of the LLM. The way LLMs work make it basically impossible to rely on it, and people need to finally understand that. If you want to use it for serious work, you always have to fact check it.

[–] [email protected] 0 points 1 year ago

This guy gets it. Thanks for the excellent post.

load more comments (15 replies)

[–] [email protected] 0 points 1 year ago (1 children)

I'd expect them to put 1/8 cup of glue in their pizza

[–] [email protected] 0 points 1 year ago

That's my point. Some of them wouldn't even go through the trouble of making sure that it's non-toxic glue.

There are humans out there who ate laundry pods because the internet told them to.

[–] [email protected] 0 points 1 year ago (1 children)

We are experiencing a watered down version of Microsoft's Tay

[–] [email protected] 0 points 1 year ago

Oh boy, that was hilarious!

load more comments (26 replies)

[–] [email protected] 0 points 1 year ago (2 children)

So the new strategy is don't delete your comments on reddit before deleting the account?? :D

[–] [email protected] 0 points 1 year ago

Delete your comments to make Reddit less useful to other users.

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago)

Deleting your comments was only ever a token move anyway. I'm sure they have backups. I'm sure they give Google the backups not the original anyway.

[–] [email protected] 0 points 1 year ago

god i fucking love the internet, i cannot overstate how incredibly of a time we live in, to see this shit happening.

load more comments