this post was submitted on 22 Jul 2024
69 points (94.8% liked)

No Stupid Questions

35681 readers
929 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.



Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.



Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.



Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 1 year ago
MODERATORS
 

Is there any computer program with AI capabilities (the generative ones seen in ChatGPT; onlineText-to-Picture generators, etc.) that is actually standalone? i.e. able to run in a fully offline environment.

As far as I understand, the most popular AI technology rn consists of a bunch of matrix algebra, convolutions and parallel processing of many low-precision floating-point numbers, which works because statistics and absurdly huge datasets. So if any such program existed, how would it even have a reasonable storage size if it needs the dataset?

top 33 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 3 months ago

I use Krita with the AI Diffusion plugin for Image Generation, which is working great, and Jan for text Generation, using the Llama 3 8B Q4 model. I have a NVIDIA GTX 1660 Ti with 6GB of VRAM and both are reasonably fast.

[–] [email protected] 4 points 3 months ago (1 children)

For LLMs, the already mentioned LM Studio does a good job as far as beginner friendliness goes.

For text-to-image, I like Fooocus, which is a custom Stable Diffusion setup with automatic prompt enhancement, which can comfortably compete with Midjourney.

Here’s a setup guide for first time users. There’s also an online version to try it out.

[–] [email protected] 2 points 3 months ago

Just wanted to thank you, as I hadn't had any luck running any other SD software on my AMD setup with Nobara. But after a couple of fixes to get rocm running, this one runs, and runs pretty fast. Thanks!

[–] [email protected] 13 points 3 months ago

https://lmstudio.ai/

You can load up your own datasets, has some of its own, too. Most of these are pretty good, but run on synthetic data. Storing and processing something the size of chatgpt would bankrupt most people.

This program can use significant amounts of computer resources if you let her eat. I recommend closing other programs and games.

[–] [email protected] 1 points 3 months ago

ComfyUI is the best for image AI

[–] [email protected] 10 points 3 months ago

GPT4ALL for chat and Automatic1111 for generative with downloaded models works great. The former does not require a gpu but the later generally does.

[–] [email protected] 2 points 3 months ago

If you are into development, the setup I use is ollama running codegemma:7b along with the Continue.dev plugin for vscode.

[–] [email protected] 8 points 3 months ago

Stable diffusion and ollama for image and text generation locally. Super easy to do on linux and support gpu acceleration out of the box

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

For LLMs, I've had really good results running Llama 3 in the Open Web UI docker container on a Nvidia Titan X (12GB VRAM).

For image generation tho, I agree more VRAM is better, but the algorithms still struggle with large image dimensions, ao you wind up needing to start small and iterarively upscale, which afaik works ok on weaker GPUs, but will gake problems. (I've been using the Automatic 1111 mode of the Stable Diffusion Web UI docker project.)

I'm on thumbs so I don't have the links to the git repos atm, but you basically clone them and run the docker compose files. The readmes are pretty good!

[–] [email protected] 1 points 3 months ago

Llamafile is a pretty good option for 100% local LLMs. The smaller models are pretty good for basic applications. They run at a reasonable speed on my Samsung laptop and really fast on my M2 macbook.

[–] [email protected] 2 points 3 months ago

If you have a good GPU, you should be able to run a model without issue. The big ones are technically usable with tweaking but so slow enough to be useless on normal hardware. A small model may be 4-8 gb, but a larger one could be 100+gb. You don't need the training data(if its even public) to run them, only if your building or retraining the model. There's a crap ton of different software to run AI on.

To get started assuming you got a beefy PC, you need a model and software to interact with it. I started with mistral7b and textGenWebUi and been trying out different software and models. Text gen has the basics to load and chat to a model and is a good starting point.

Model-https://mistral.ai/technology/#models Software-https://github.com/oobabooga/text-generation-webui

For Images, you can choose models at based on what the sample images look like, they tend to be specialized for certain styles or content. You can add LORAs to further change how the output looks(think specific characters or poses). It's very much trial and error getting good images.

Models-https://civitai.com/ (potentially NSFW) Software-https://github.com/vladmandic/automatic

There's more models and software out their than I can keep track of, so if something is crap you should be able to find an alternative. Youtube guides are you friend.

[–] [email protected] 5 points 3 months ago

Krita has an AI plugin that's pretty painless to setup if you've got an nVidia card. AMD has to be done manually or you can fall back to slow CPU generation. It uses ComfyUI in the background.

[–] [email protected] 3 points 3 months ago

You need a GPU for any kind of performance.

For text I suggest: Ollama backend - command line interface, very easy to download models with one line of code. Supports most models and you can talk with the model inside the terminal so it's stand alone OpenWebUI - easy install with docker and is meant to work easily with ollama. Comes with web search features and uploading pdfs. A bunch of different community tools and modules are available.

For img I suggest either: Automatic1111 - Traditional UI using gradio. Lots of extras you can download through the UI to do different things. ComfyUI - Node based UI, a bit more complicated but more powerful than automatic1111

For models, you can go on civitai and just download whatever you need and drop it into their respective folders for both auto and comfy.

For text, there's also LMStudio which is very user friendly. It is closed source and much slower than ollama from my experience though. I have a 4060 in my laptop (8gb VRAM) and I'm getting an image every 2 secs about using stable diffusion 1.5 models and text speed is on par with chatgpt with the smaller 8b-9b model. For text I suggest gemma2 which is probably the best small model out right now.

[–] [email protected] 1 points 3 months ago (1 children)

Do you have 24gb GPU.

If so.. Then you can get decent results from running local models

[–] [email protected] 5 points 3 months ago

You can get decent results with much less these days, actually. I don't have personal experience (I do have a 24GB GPU) but the open source community has put a lot of work into getting models to run on lower-spec machines. Aim for smaller models (8B parameters is common) and low quantization (the values of the parameters get squished into smaller numbers of bits). It's slower and the results can be of noticeably lower quality but I've seen people talk about usable LLMs running CPU-only.

[–] [email protected] 8 points 3 months ago* (last edited 3 months ago) (1 children)

Local LLMs can be compressed to fit on consumer hardware. Model formats like GUFF and Exl2 can be loaded up with a offline hosted API like KobaldCPP or Oobabooga. These formats lose resolution from the full floating point model and become "dumber" but it's good enough for many uses.

Also noting these models are like, 7, 11, 20 Billion parameters while hosted models like ChatGPT run closer to 8x220 Billion

[–] [email protected] 4 points 3 months ago (1 children)

Though bear in mind that parameter count alone is not the only measure of a model's quality. There's been a lot of work done over the past year or two on getting better results from the same or smaller parameter counts, lots of discoveries have been made on how to train better and run inferencing better. The old ChatGPT3 from back at the dawn of all this was really big and was trained on a huge number of tokens but nowadays the small downloadable models fine-tuned by hobbyists would compete with it handily.

[–] [email protected] 1 points 3 months ago (1 children)

Agreed, especially true with Llama3 their 7b model is extremely competitive.

[–] [email protected] 4 points 3 months ago (2 children)

Makes it all the more amusing how OpenAI staff were fretting about how GPT-2 was "too dangerous to release" back in the day. Nowadays that class of LLM is a mere toy.

[–] [email protected] 2 points 3 months ago

They were fretting about it until their morals went out the door for money.

[–] [email protected] 2 points 3 months ago

Whenever these corps talk up the danger of AI, all I think is "nice marketing dept bro"

[–] [email protected] 20 points 3 months ago

Stable Diffusion (AI image generation) runs fully locally. The models (the datasets you're referring to) are generally around 3GB in size. It's more about the processing power needed for it to run (it's very GPU-intensive) than the storage size on disk.

[–] [email protected] 7 points 3 months ago (1 children)

The AI, image, and audio models that can run on a typical PC have all been broken down from originally larger models. How this is done affects what the models can do and the quality, but the open source community has come a long way in making impressive stuff. First question is more hardware - do you have an Nvidia GPU that can support these types of generations? They can be done through CPU alone, but it's painfully much slower.

If so, then I would highly recommend looking into Ollama for running AI models (using WSL if you're using Windows) and ComfyUI for graphical generation. Don't let the workflow of complicated ComfyUI scare you, starting from the basics with plenty of Youtube help out there it will make sense. As for TTS, there's a lot of constant "new stuff" out there, but for actual local processing in "real time" (still takes a bit) I have yet to find anything to replace my Coqui TTS copy with Jenny as the model voice. It may take some digging and work to get that together, it's older and not supported anymore.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago) (1 children)

I don't think they break them down. For most models the math requires to start at the beginning and train each model individually from ground up.

But sure, a smaller model generally isn't as capable as a bigger one. And you can't train them indefinitly. So for a model series you'll maybe use the same dataset but feed more into the super big variant and not so much into the tiny one.

And there is something where you use a big model to generate questions and answers and use them to train a different, small model. And that model will learn to respond like the big one.

[–] [email protected] 4 points 3 months ago (1 children)

The breaking down I mentioned is the quantization that forms a smaller model from the larger one. I didn't want to get technical because I don't understand the math details myself past how to use them. :)

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

Ah, sure. I think a good way to phrase it is to say they lower the precision. That's basically what they do, convert the high precision numbers to lower precision fomats. That makes the computations easier/faster and the files smaller.

And it also doesn't apply to text, audio and images. As far as I know quantization is mainly used with LLMs. It's also possible with images and audio, but generally they don't do that. As far as I remember it leads to degradation and distortions pretty fast. There are other methods like pruning used with generative image models. That brings down their size substantially.

[–] [email protected] 55 points 3 months ago (1 children)

There is tons of "standalone" software that you can run on your own PC

  • For Text generation, the easiest way is to get GPT4All package which allows you to run text generation model in CPU on your own PC

  • For image generation, you can try to get Easy difusion package which is an easy to use stable diffusion package, then if you like-it, time to try the "comfyUI"

You can check [email protected] and [email protected] for some more information

[–] [email protected] 2 points 3 months ago (3 children)

I’ve wanted to try these out for shits and giggles - what would I expect with a 3090, is it going to take a long time to make some shitposts?

[–] [email protected] 3 points 3 months ago (1 children)

I did a bunch of image generation on my 3080 and it felt extremely fast. Enough that I was able to set it up as a shared node in one of those image generation nets and it outperformed most other people in the net.

[–] [email protected] 1 points 2 months ago* (last edited 2 months ago) (1 children)

shared node in one of those image generation nets

You mean like AI Horde?

[–] [email protected] 1 points 2 months ago

Yeah I couldn’t remember what it was called lol

[–] [email protected] 8 points 3 months ago

3090s are ideal because the most important factor is vram, and those are at the top of the plateau for vram until you get into absurdly expensive server hardware. Expect around 3 seconds for generating a 512x512 image or 4 words per second generating text at around GPT 3.5 quality.

[–] [email protected] 3 points 3 months ago

With SD 1.5 my old GTX 970was doing fine (30 second per image) I upgraded to a Radeon 7060 and with SDXL get like 4 images in these 30 seconds (but sometimes crash my Pac when loading a model)