this post was submitted on 26 Jul 2024
230 points (96.7% liked)

science

14266 readers
626 users here now

just science related topics. please contribute

note: clickbait sources/headlines aren't liked generally. I've posted crap sources and later deleted or edit to improve after complaints. whoops, sry

Rule 1) Be kind.

lemmy.world rules: https://mastodon.world/about

I don't screen everything, lrn2scroll

founded 1 year ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 month ago
[–] [email protected] 6 points 1 month ago

interesting refinement of the old GIGO effect.

[–] [email protected] 4 points 1 month ago

This seems to logically follow. The copy of a copy of a copy paradigm. We train AI on what humans like. By running stuff back through the trainig data, we're adding noise back in.

To be fair, we already add noise, in that human art has its own errors, which we try to filter out using additional data featuring more of what we want and less of what we don't want.

[–] [email protected] 1 points 1 month ago

Looks like an AI trained on Louis Wain paintings.

[–] [email protected] 4 points 1 month ago* (last edited 1 month ago)

Serious salvia flashbacks from that headline image.

[–] [email protected] 5 points 1 month ago

Deadpool spoilers

[–] [email protected] 7 points 1 month ago

And screenshotting a jpeg over and over again reduces the quality?

[–] [email protected] 13 points 1 month ago (2 children)

I wonder if the speed at which it degrades can be used to detect AI-generated content.

[–] [email protected] 1 points 1 month ago

literally just the difference between flac and mp3 as it were digital conversion noise with a little bot behind it

[–] [email protected] 6 points 1 month ago (1 children)

I wouldn't be surprised if someone is working on that as a PhD thesis right now.

[–] [email protected] 1 points 1 month ago (1 children)

how are you going to write a thesis on writing a FLAC to disc and ripping it over and over?

[–] [email protected] 2 points 1 month ago (1 children)

By measuring how it does with real images vs generated ones to start. The goal would be to show a method to reliably detect ai images. Gotta prove that it works.

[–] [email protected] 1 points 1 month ago (1 children)

How would it detect, you would need the model and if you do you can already detect

[–] [email protected] 1 points 1 month ago (1 children)

It's an issue with the machine learning technique, not the specific model. The hypothetical thesis would be how to use this knowledge in general.

Why are you so agitated by my off hand comment?

[–] [email protected] 1 points 1 month ago

Am I agitated? 😂 💜 You it's not with all models no

[–] [email protected] 54 points 1 month ago

It's like that painter who kept doing self-portraits through alzheimers.

[–] [email protected] 13 points 1 month ago

How many times did you say this went through a copy machine?

[–] [email protected] 10 points 1 month ago (4 children)

I only have a limited and basic understanding of Machine Learning, but doesn't training models basically work like: "you, machine, spit out several versions of stuff and I, programmer, give you a way of evaluating how 'good' they are, so over time you 'learn' to generate better stuff"? Theoretically giving a newer model the output of a previous one should improve on the result, if the new model has a way of evaluating "improved".

If I feed a ML model with pictures of eldritch beings and tell them that "this is what a human face looks like" I don't think it's surprising that quality deteriorates. What am I missing?

[–] [email protected] 3 points 1 month ago* (last edited 1 month ago)

Part of the problem is that we have relatively little insight into or control over what the machine has actually "learned". Once it has learned itself into a dead end with bad data, you can't correct it, only work around it. Your only real shot at a better model is to start over.

When the first models were created, we had a whole internet of "pure" training data made by humans and developers could basically blindly firehose all that content into a model. Additional tuning could be done by seeing what responses humans tended to reject or accept, and what language they used to refine their results. The latter still works, and better heuristics (the criteria that grades the quality of AI output) can be developed, but with how much AI content is out there, they will never have a better training set than what they started with. The whole of the internet now contains the result of every dead end AI has worked itself into with no way to determine what is AI generated on a large scale.

[–] [email protected] 2 points 1 month ago

It takes a massive number of intelligent humans that expect to be paid fairly to train the models. Most companies jumping on the AI bandwagon are doing it for quick profits and are dropping the ball on that part.

[–] [email protected] 8 points 1 month ago* (last edited 1 month ago) (1 children)

In this case, the models are given part of the text from the training data and asked to predict the next word. This appears to work decently well on the pre-2023 internet as it brought us ChatGPT and friends.

This paper is claiming that when you train LLMs on output from other LLMs, it produces garbage. The problem is that the evaluation of the quality of the guess is based on the training data, not some external, intelligent judge.

[–] [email protected] 2 points 1 month ago

ah I get what you're saying., thanks! "Good" means that what the machine outputs should be statistically similar (based on comparing billions of parameters) to the provided training data, so if the training data gradually gains more examples of e.g. noses being attached to the wrong side of the head, the model also grows more likely to generate similar output.

[–] [email protected] 5 points 1 month ago (1 children)

I find it surprising that anyone is surprised by it. This was my initial reaction when I learned about it.

I thought that since they know the subject better than myself they must have figured this one out, and I simply don't understand it, but if you have a model that can create something, because it needs to be trained, you can't just use itself to train it. It is similar to not being able to generate truly random numbers algorithmically without some external input.

[–] [email protected] 2 points 1 month ago

Sounds reasonable, but a lot of recent advances come from being able to let the machine train against itself, or a twin / opponent without human involvement.

As an example of just running the thing itself, consider a neural network given the objective of re-creating its input with a narrow layer in the middle. This forces a narrower description (eg age/sex/race/facing left or right/whatever) of the feature space.

Another is GAN, where you run fake vs spot-the-fake until it gets good.

[–] [email protected] 29 points 1 month ago* (last edited 1 month ago) (1 children)

GOOD.

This "informational incest" is present in many aspects of society and needs to be stopped (one of the worst places is in the Intelligence sector).

[–] [email protected] 14 points 1 month ago (3 children)

Informational Incest is my least favorite IT company.

[–] [email protected] 2 points 1 month ago

Too bad they only operate in Alabama

[–] [email protected] 9 points 1 month ago* (last edited 1 month ago)

WHAT ARE YOU DOING STEP SYS ADMIN?

[–] [email protected] 1 points 1 month ago (1 children)

Damn. I just bought 200 shares of ININ.

[–] [email protected] 2 points 1 month ago

they'll be acquired by McKinsey soon enough

[–] [email protected] 1 points 1 month ago (1 children)

When you fed your AI too much mescaline.

[–] [email protected] 1 points 1 month ago

It's the AI version of the human centipede rather.

[–] [email protected] 10 points 1 month ago

Huh. Who would have thought talking mostly or only to yourself would drive you mad?

[–] [email protected] 1 points 1 month ago (3 children)

As long as you verify the output to be correct before feeding it back is probably not bad.

[–] [email protected] 3 points 1 month ago (1 children)

How do you verify novel content generated by AI? How do you verify content harvested from the Internet to "be correct"?

[–] [email protected] 2 points 1 month ago

Same way you verified the input to begin with. Human labor

[–] [email protected] 1 points 1 month ago

The issue is that A.I. always does a certain amount of mistakes when outputting something. It may even be the tiniest, most insignificant mistake. But if it internalizes it, it'll make another mistake including the one it internalized. So on and so forth.

Also this is more with scraping in mind. So like, the A.I. goes on the internet, scrapes other A.I. images because there's a lot of them now, and becomes worse.

[–] [email protected] 3 points 1 month ago

That’s correct, and the paper supports this. But people don’t want to believe it’s true so they keep propagating this myth.

Training on AI outputs is fine as long as you filter the outputs to only things you want to see.

[–] [email protected] 10 points 1 month ago

The Habsburg Singularity

[–] [email protected] 20 points 1 month ago (1 children)
[–] [email protected] 3 points 1 month ago

that shit will pave the way for new age horror movies i swear

load more comments
view more: next ›