this post was submitted on 23 Mar 2024
377 points (87.9% liked)

Technology

59429 readers
3333 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Setting aside the usual arguments on the anti- and pro-AI art debate and the nature of creativity itself, perhaps the negative reaction that the Redditor encountered is part of a sea change in opinion among many people that think corporate AI platforms are exploitive and extractive in nature because their datasets rely on copyrighted material without the original artists' permission. And that's without getting into AI's negative drag on the environment.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 7 months ago* (last edited 7 months ago)

Yeah and there are tons of angles and gestures for human subjects that AI just can’t figure out still.

Actually less so because it can't draw the stuff but because it doesn't want to on its own, and there's no way to ask it to do anything different with built-in tools, you have to bring your own.

Say I ask you to draw a car. You're probably going to do a profile or 3/4th view (is that the right terminology for car portraits?), possibly a head-on, you're utterly unlikely to draw the car from the top, or from the perspective of a mechanic lying under it.

Combine that tendency to draw cars from a limited set of perspectives because "that's how you draw cars" with the inability of CLIP (the language model stable diffusion uses) to understand pretty much, well, anything (it's not a LLM), and you'll have no chance getting the model to draw the car from a non-standard perspective.

Throw in some other kind of conditioning, though, like a depth map, doesn't even need to be accurate it can be very rough, the information density equivalent of me gesturing the outline of a car and a camera, and suddenly all kinds of angles are possible. Probably not under the car as the model is unlikely to know much about it, but everything else should work just fine.

SDXL can paint, say, a man in a tuxedo doing one-hand pullups while eating a sandwich with the other. Good luck prompting that only with text, though.