[-] [email protected] 1 points 2 minutes ago

OpenAI seems to be functioning.

The problem with speech to text is the background noise and the many variations of speech. I've played around with a couple of models. I can get one to work with my voice with little effort in training, but when my window AC kicks in or my computer fan hits the highest setting, it becomes a problem because the training is very dependant on the noise floor. I think they are likely extremely limited in the audio gear available in combination with the compute hardware to make it viable. Human hearing has a relatively large dynamic range and we have natural analog filtering. A machine just doing math can't handle things like clipping from someone speaking too loud, or understand the periodicity of all the vehicle and background noises like wind, birds, and other people in the vicinity. Everything that humans can contextualize is like a small learned program and alignment that took many years to train.

You will not see the full use cases of AI for quite awhile. The publicly facing tools are nowhere near the actual capabilities of present AI. If you simply read the introductory documentation for the Transformers library, which is the basis of almost all the AI stuff you see in any public spaces, the documentation clearly states that it is a a simplified tool that bypasses complexity in an attempt to make the codebase approachable to more people in various fields. It is in no way a comprehensive implementation. People are forming opinions based on projects that are hacked together using Transformers. The real shakeups are happening in business where companies like OpenAI are not peddling the simple public API, they are demonstrating the full implementations directly.

submitted 40 minutes ago by [email protected] to c/[email protected]
[-] [email protected] 10 points 23 hours ago

Hanging sheetrock is screwing in every position imaginable, in every room of the house.

[-] [email protected] 0 points 1 day ago

It has a lot of potential if the T5 can be made conversational. After diving into a custom DPM adaptive sampler, there is a lot more specificity required. I believe the vast majority of people are not using the model with the correct workflow. Applying the old model workflows to SD3 makes garbage results. The 2 CLIPS models and the T5 need separate prompts, and the negative prompt needs an inverted channel with a slight delay before reintegration. I also think the smaller quantized version of the T5 is likely the primary problem overall. Any Transformer text model that small, that is them quantized to extremely small size is problematic.

The license is garbage. The company is toxic. But the tool is more complex than most of the community seems to understand. I can generate a woman lying on grass in many intentional and iterative ways.

[-] [email protected] 14 points 1 day ago* (last edited 1 day ago)

In the last century? The diode, aka the P/N junction and every variant that has been created ever since.

Recently? Capacitive touch screens are by far the most significant change.

[-] [email protected] 7 points 1 day ago

At 1:30 am? Two obnoxious cats licking themselves passive aggressively competing for attention the moment I put down this idiot brick.

[-] [email protected] 1 points 2 days ago

I think the difference is typical of any base model. I have several base models on my computer and the behavior of SD3 is quite typical. I fully expect their website hosts a fine tune version.

There are a lot of cultural expectations that any given group around the world has about generative AI and far more use cases than any of us can imagine. The base models have an unbiased diversity that reflects their general use; much is possible, but much is hard.

If "woman lying in grass" was truly filtered, what I showed here would not be possible. If you haven't seen it, I edited the post with several of the images in the chain I used to get to the main post image here. The post image is not an anomaly that got through a filter, it is an iterative chain. It is not an easy path to find, but it does exist in the base training corpus.

Personally, I think the real secret sauce is the middle CLIP agent and how it relates to the T5 agent.

[-] [email protected] 1 points 2 days ago

I edited the post with more of the image set I generated while getting to this one.

[-] [email protected] 2 points 2 days ago

ComfyUI embeds the entire workflow (all nodes and connections) inside the metadata of each image. Lemmy wipes all metadata if the instances are hosting the image. I share the original link because that version has the metadata still attached. If you drag it into the Comfy interface it will load up the whole thing automatically.

[-] [email protected] 2 points 2 days ago

Not yet. The SD3 checkpoint is too raw and needs a lot of fine tuning. It is also way more complicated with more control variables.

[-] [email protected] 20 points 2 days ago

Uhhh China! They invested in technology when we were too corrupt and stupid to take them serious. Now we pull McCarthy bullshit to whine about it. Xi said they were going to invest in the technology in 2014. It is not government subsidies, it is research and development of new technologies. China is better at it because we put no effort into it. The US auto industry has always been a shit show of back patting fools in their own bubble while building absolute garbage. That is why Japanese cars dominate, next will be China.

[-] [email protected] 2 points 2 days ago

Medium. I could only get the one with everything packaged to work so far. I tried the one with just the two clip's packaged, but I only get errors when trying to combine the T5 clip and the other two. I need to get all the pieces separately. Running the T5 in fp16 will likely make a massive difference.

I also need to find a way to read the white paper. I'm not willing to connect to their joke square space website and all of Google's garbage just to get the paper. I'll spin up another machine on a guest network and VPN if I need tomorrow. There is a difference between the two clip models and their functionality that I need to understand better and isn't evident in the example workflows or written about on civitai.

[-] [email protected] 1 points 2 days ago

Your image doesn't have the workflow for me to have a look. You would need the exact same setup with the unique prompt in all 3 positive positions. I'm not positive my custom sampler setup is fully transferable. If you manually tried to add the prompt, that won't work. It could be how my GPU generates seeds too. I think there is something I've read about that being hardware specific in its determinism.

When I try something new, I lock the seed and build everything incrementally. I was working in a tight space that was not super flexible before it would error, but what I was doing was not anomalous. I picked the best out of a dozen or so variants.

I don't try reproducing other's gens often, and I'm trying to get away from the formulaic prompts by starting with something simple and building out the complexity as I go.

I'm not positive about the mechanism that causes the ComfyUI history to impact subsequent generation images, but I intuitively know there is some kind of mechanism similar to how momentum works inside a LLM. I started off this image session by generating around a dozen images of a doll in grass and left them in the history while I started building the prompt I shared. My big secret for success right now is to first generate pictures of things adjacent to the generation I want to make and then bring them together. I generated a few images of grass, the dolls and a few portraits of women. I don't let any errors sneak into the history unless I really need a variation in one image to jump to where I want to go.

The part I really don't understand is how that momentum is stored in the images I have saved. Like I can drop this post's image into a fresh session and I'll basically start right where I left off, but the history and what I'm calling momentum is not there. However, I have watched hundreds of times where a batched generation builds upon itself incrementally. Like if there are 5 images left and the the present image gets a wonky error, every subsequent image will have an error. My gut feeling is that this must be related to how the clip can be saved and exported, but I could easily be wrong with that one.

With this last session I couldn't get it to do the side view. I'm not on my computer now, but I thought this image was from long before the point when I added the side image attempt or the SD3 logo. There may be an error with those new SD3 modules and how they save workflows too.

Just off the cuff, your image here is weighting the last line of the positive prompt differently. I can see most of the obfuscation techniques that are being used in the pedophilia safety system. There are 'harm, mask, creepy face, and stone eyes" that were layered onto your prompt inside the model itself. You likely also have 'parasitic conjoined twin, and broken bones," but I can't see those directly.

The T5 model is just an LLM, so how that is run in the background will have a big impact, especially how it was quantized. That is a very small model, and I expect it to suck massively. As soon as I saw that I realized I need to track down how to use that model as an LLM to learn its particular idiosyncrasies. A heavily quantized small model like that is going to be a dumbass in so many ways.

Anyways, hope that helps. I'm no expert. I just like exploring deeper than most.

submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]

The ComfyUI prompt and workflow is attached to the image: https://files.catbox.moe/s3qufb.png

You can't copy pasta this prompt. There are a few nodes that are specific to SD3 and required.

::: spoiler EDIT: more proof of the chain that lead to this image. They were not all this good. I'm cherry picking for sure and these are just webp's without workflows attached:

I'll Pug you up! (files.catbox.moe)
submitted 3 days ago by [email protected] to c/[email protected]

Prompt was through ComfyUI, so it is embedded in the image: https://files.catbox.moe/aiy8p1.png

It was supposed to be a pug kangaroo hybrid but the AI apparently wanted to throw in some monkey too.

submitted 5 days ago by [email protected] to c/[email protected]

Spots have turned to stripes on the bottom half of this one.

submitted 6 days ago* (last edited 6 days ago) by [email protected] to c/[email protected]

I want to extract and process the metadata from PNG images and the first line of .safetensors files for LLM's and LoRA's. I could spend ages farting around with sed or awk but formats of files are constantly changing. I'd like a faster way to see a summary of training and a few other details when they are available.

catholic school uniform (image.civitai.com)
submitted 1 week ago by [email protected] to c/[email protected]

Felt sexy, needed to share. Not my gen, just one from a NSFW LoRA posted on civitai.


submitted 1 week ago by [email protected] to c/[email protected]
Shooting star on the Moon (files.catbox.moe)
submitted 1 week ago by [email protected] to c/[email protected]

Double dipping this one from the community challenge #37 pinned to the top of [email protected]

Vote on my impossibly hard challenge please.

submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

The Up Side Down

I want to see the most impressive images of an upside down reality. From objects, to landscapes, or anything in between, what kinds of inceptions can you create.

extra creditI've been trying to create an O'Neill Cylinder space habitat interior for a week now. Major kudos if you can can defy gravity with one of these. If you don't know what an O'Neill cylinder is, watch this 3min render: https://www.youtube.com/watch?v=Z2d_0l5ycRM or here is the wiki


  • Follow the community’s rules above all else
  • One comment and image per user
  • Embed image directly in the post (no external link)
  • Workflow/Prompt sharing encouraged (we’re all here for fun and learning)
  • Posts that are tied will both get the points
  • The challenge runs for 7 days from now on
  • Down votes will not be counted


At the end of the challenge each post will be scored:

  • Most upvoted: +3 points
  • Second most upvoted: +2 pointS
  • Third most upvoted: +1 point
  • OP’s favorite: +1 point
  • Most unconformable: +1 point
  • Last two entries (to compensate for less time to vote): +1 point
  • Prompt and workflow included: +1 point

The winner gets to pick next theme! Have fun everyone!

submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

Someone a few hours earlier asked, what would you do if you found porn after a family member passed. I am asking, what kind of porn did you get, just to be found after you're gone? For better or worse! What is your last way to get in a dad joke or eye roll?

submitted 3 weeks ago by [email protected] to c/[email protected]

I've lived under a rock for 10 years. I did Metro ages ago while most were still on contracts. Surely we've reached true capitalist open market freedom by now. Is it still total closed market, noncompetitive, privateering corruption?

submitted 3 weeks ago by [email protected] to c/[email protected]

I've used distrobox more and more and am at the point where I need to start saving and integrating history differently. Or like, when I'm installing and building something complicated, I need to start saving that specific session's history. I am curious what others might be doing and looking for simple advice and ideas.

view more: next ›


joined 1 year ago