this post was submitted on 16 Jun 2024
33 points (83.7% liked)

AI Generated Images

7060 readers
151 users here now

Community for AI image generation. Any models are allowed. Creativity is valuable! It is recommended to post the model used for reference, but not a rule.

No explicit violence, gore, or nudity.

This is not a NSFW community although exceptions are sometimes made. Any NSFW posts must be marked as NSFW and may be removed at any moderator's discretion. Any suggestive imagery may be removed at any time.

Refer to https://lemmynsfw.com/ for any NSFW imagery.

No misconduct: Harassment, Abuse or assault, Bullying, Illegal activity, Discrimination, Racism, Trolling, Bigotry.

AI Generated Videos are allowed under the same rules. Photosensitivity warning required for any flashing videos.

To embed images type:

“![](put image url in here)”

Follow all sh.itjust.works rules.


Community Challenge Past Entries

Related communities:

founded 1 year ago
MODERATORS
 

The ComfyUI prompt and workflow is attached to the image: https://files.catbox.moe/s3qufb.png

You can't copy pasta this prompt. There are a few nodes that are specific to SD3 and required.

::: spoiler EDIT: more proof of the chain that lead to this image. They were not all this good. I'm cherry picking for sure and these are just webp's without workflows attached:

top 18 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 2 months ago (1 children)

I'm out of the loop. What do you mean the workflow is attached to the image? It's in the image metadata?

[–] [email protected] 2 points 2 months ago (1 children)

ComfyUI embeds the entire workflow (all nodes and connections) inside the metadata of each image. Lemmy wipes all metadata if the instances are hosting the image. I share the original link because that version has the metadata still attached. If you drag it into the Comfy interface it will load up the whole thing automatically.

[–] [email protected] 3 points 2 months ago

That's awesome. In an ideal world all AI gen images would be like that

[–] [email protected] 2 points 2 months ago (3 children)

Is SD3 much better than the good SDXL checkpoints on CivitAI?

[–] [email protected] 3 points 2 months ago (2 children)

No, the version they released isn't the full parameter set, and it's leading to really bad results in a lot of prompts. You get dramatically better results using their API version, so the full sd3 model is good, but the version we have is not.

Here's an example of SD3 API version: SD3 API

And here's the same prompt on the local weights version they released: SD3 local weights 2B

People think stability AI censored NSFW content in the released model, which has crippled its ability to understand a lot of poses and how anatomy works in general.

For more examples of the issues with SD3, I'd recommend checking this reddit thread.

[–] [email protected] 1 points 2 months ago

Thanks, I'm sticking to SDXL finetunes for now. I expect the community will uncensor the model fairly quickly.

[–] [email protected] 1 points 2 months ago

I think the difference is typical of any base model. I have several base models on my computer and the behavior of SD3 is quite typical. I fully expect their website hosts a fine tune version.

There are a lot of cultural expectations that any given group around the world has about generative AI and far more use cases than any of us can imagine. The base models have an unbiased diversity that reflects their general use; much is possible, but much is hard.

If "woman lying in grass" was truly filtered, what I showed here would not be possible. If you haven't seen it, I edited the post with several of the images in the chain I used to get to the main post image here. The post image is not an anomaly that got through a filter, it is an iterative chain. It is not an easy path to find, but it does exist in the base training corpus.

Personally, I think the real secret sauce is the middle CLIP agent and how it relates to the T5 agent.

[–] [email protected] 2 points 2 months ago (1 children)

It's pretty terrible. It feels half baked, and heavily censored.

[–] [email protected] 1 points 2 months ago

More over-cooked than half-baked.

Should have left it alone when it worked.

[–] [email protected] 2 points 2 months ago

Not yet. The SD3 checkpoint is too raw and needs a lot of fine tuning. It is also way more complicated with more control variables.

[–] [email protected] 2 points 2 months ago (1 children)

Large or Medium? Any idea when large when be available to download locally?

[–] [email protected] 2 points 2 months ago (1 children)

Medium. I could only get the one with everything packaged to work so far. I tried the one with just the two clip's packaged, but I only get errors when trying to combine the T5 clip and the other two. I need to get all the pieces separately. Running the T5 in fp16 will likely make a massive difference.

I also need to find a way to read the white paper. I'm not willing to connect to their joke square space website and all of Google's garbage just to get the paper. I'll spin up another machine on a guest network and VPN if I need tomorrow. There is a difference between the two clip models and their functionality that I need to understand better and isn't evident in the example workflows or written about on civitai.

[–] [email protected] 2 points 2 months ago

Awesome, thanks so much for the info. It's a really picture you made!

[–] [email protected] 6 points 2 months ago (2 children)

The grass looks like it's attempting to strangle her LOL

[–] [email protected] 1 points 2 months ago

I edited the post with more of the image set I generated while getting to this one.

[–] [email protected] 11 points 2 months ago (2 children)

I tried using your workflow, got this LOL

[–] [email protected] 4 points 2 months ago

KILL IT WITH FIRE

[–] [email protected] 1 points 2 months ago

Your image doesn't have the workflow for me to have a look. You would need the exact same setup with the unique prompt in all 3 positive positions. I'm not positive my custom sampler setup is fully transferable. If you manually tried to add the prompt, that won't work. It could be how my GPU generates seeds too. I think there is something I've read about that being hardware specific in its determinism.

When I try something new, I lock the seed and build everything incrementally. I was working in a tight space that was not super flexible before it would error, but what I was doing was not anomalous. I picked the best out of a dozen or so variants.

I don't try reproducing other's gens often, and I'm trying to get away from the formulaic prompts by starting with something simple and building out the complexity as I go.

I'm not positive about the mechanism that causes the ComfyUI history to impact subsequent generation images, but I intuitively know there is some kind of mechanism similar to how momentum works inside a LLM. I started off this image session by generating around a dozen images of a doll in grass and left them in the history while I started building the prompt I shared. My big secret for success right now is to first generate pictures of things adjacent to the generation I want to make and then bring them together. I generated a few images of grass, the dolls and a few portraits of women. I don't let any errors sneak into the history unless I really need a variation in one image to jump to where I want to go.

The part I really don't understand is how that momentum is stored in the images I have saved. Like I can drop this post's image into a fresh session and I'll basically start right where I left off, but the history and what I'm calling momentum is not there. However, I have watched hundreds of times where a batched generation builds upon itself incrementally. Like if there are 5 images left and the the present image gets a wonky error, every subsequent image will have an error. My gut feeling is that this must be related to how the clip can be saved and exported, but I could easily be wrong with that one.

With this last session I couldn't get it to do the side view. I'm not on my computer now, but I thought this image was from long before the point when I added the side image attempt or the SD3 logo. There may be an error with those new SD3 modules and how they save workflows too.

Just off the cuff, your image here is weighting the last line of the positive prompt differently. I can see most of the obfuscation techniques that are being used in the pedophilia safety system. There are 'harm, mask, creepy face, and stone eyes" that were layered onto your prompt inside the model itself. You likely also have 'parasitic conjoined twin, and broken bones," but I can't see those directly.

The T5 model is just an LLM, so how that is run in the background will have a big impact, especially how it was quantized. That is a very small model, and I expect it to suck massively. As soon as I saw that I realized I need to track down how to use that model as an LLM to learn its particular idiosyncrasies. A heavily quantized small model like that is going to be a dumbass in so many ways.

Anyways, hope that helps. I'm no expert. I just like exploring deeper than most.