this post was submitted on 09 Aug 2024
260 points (93.6% liked)

Selfhosted

40006 readers
578 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I don't consider myself very technical. I've never taken a computer science course and don't know python. I've learned some things like Linux, the command line, docker and networking/pfSense because I value my privacy. My point is that anyone can do this, even if you aren't technical.

I tried both LM Studio and Ollama. I prefer Ollama. Then you download models and use them to have your own private, personal GPT. I access it both on my local machine through the command line but I also installed Open WebUI in a docker container so I can access it on any device on my local network (I don't expose services to the internet).

Having a private ai/gpt is pretty cool. You can download and test new models. And it is private. Yes, there are ethical concerns about how the model got the training. I'm not minimizing those concerns. But if you want your own AI/GPT assistant, give it a try. I set it up in a couple of hours, and as I said... I'm not even that technical.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 2 months ago

I am going to be buying a monster high end machine and I want to do all the AI stuff on it.

[–] [email protected] 0 points 2 months ago* (last edited 2 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
NVR Network Video Recorder (generally for CCTV)
PSU Power Supply Unit
VPN Virtual Private Network

3 acronyms in this thread; the most compressed thread commented on today has 12 acronyms.

[Thread #917 for this sub, first seen 12th Aug 2024, 07:15] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 2 points 3 months ago

Very cool! You can use something like Tailscale to access your local services remotely without exposing them to the internet.

[–] [email protected] 4 points 3 months ago (2 children)

Is there a way to host an LLM in a docker container on my home server but still leverage the GPU on my main PC?

[–] [email protected] 3 points 3 months ago

You would need to run the LLM on the system that has the GPU (your main PC). The front-end (typically a WebUI) could run in a docker container and make API calls to your LLM system. Unfortunately that requires the model to always be loaded in the VRAM on your main PC, severely reducing what you can do with that computer, GPU-wise.

[–] [email protected] 1 points 3 months ago
[–] [email protected] 3 points 3 months ago (2 children)

I also recently got into selfhosting LLM. Having an AMD card meant I had to scourge for solutions since everything expects to have CUDA suppport which means having Nvidia cards. Koboldcpp has a fork with ROCM support which works on my machine, so I'm content with that for now.

[–] [email protected] 2 points 3 months ago (1 children)

Do you have any links or guides that you found helpful? A friend wanted to try this out but basically gave up when he realized he'd need an Nvidia GPU.

[–] [email protected] 3 points 3 months ago

Look up kobold cpp yellow rose fork. It's pretty easy to set up and run

[–] [email protected] 3 points 3 months ago (1 children)

Wasnt there a solution by AMD or someone close to them implementing a translation of CUDA for AMD hardware?

[–] [email protected] 5 points 3 months ago (1 children)

AMD asked them to shut it down. So the guy is going to go back to the pre-AMD release and work independently from there.

[–] [email protected] 2 points 3 months ago (1 children)

I really hate when companies do that kind of crap. I just imagine a little toddler stomping around going "No! No! Nooo!"

[–] [email protected] 1 points 3 months ago

NVIDIA didn't ask to shut it down, but AMD lawyer probably weren't that hot to what the project had become and AMD asked the creator to shut down the project l, which he did.

But yeah, lots of work wasted caused by pencil pushers and bean counters.

[–] [email protected] 2 points 3 months ago (1 children)

With all respect, the first paragraph seems self contradictory.

[–] [email protected] 3 points 3 months ago

Very technical vs not can be very subjective.
It can be a 50 year old sysadmin vs Adam I pulled from the street or a graybeard linux admin vs a beginner sysadmin only in it for thr career instead of the passion (those can be very non-technical but good problem solver folks)

I know my comparison is flawed

[–] [email protected] 5 points 3 months ago (3 children)

Isn't this using a lot of computing power?

[–] [email protected] 5 points 2 months ago* (last edited 2 months ago)

you hear that said about AI because companies are desperately throwing more and more resources at it to get 0.3% better results, and people are collectively running an insane amount of prompts all the time.

but on a personal level it's not really any different from any other computations, people render videos all the time and no one complains about the resource usage from that, because companies aren't trying to sell bloated video rendering services to gardening businesses.

[–] [email protected] 9 points 3 months ago

Not really, it uses some GPU power when it's actively generating a response, but otherwise it just sits idle.

[–] [email protected] 5 points 3 months ago* (last edited 3 months ago)

I've been testing Ollama in Docker/WSL with the idea that if I like it I'll eventually move my GPU into my home server and get an upgrade for my gaming pc. When you run a model it has to load the whole thing into VRAM. I use the 8gb models so it takes 20-40 seconds to load the model and then each response is really fast after that and the GPU hit is pretty small. After I think five minutes by default it will unload the model to free up VRAM.

Basically this means that you either need to wait a bit for the model to warm up or you need to extend that timeout so that it stays warm longer. That means that I cannot really use my GPU for anything else while the LLM is loaded.

I haven't tracked power usage, but besides the VRAM requirements it doesn't seem too intensive on resources, but maybe I just haven't done anything complex enough yet.

[–] [email protected] 9 points 3 months ago

It's a much smaller scale but I use a Coral TPU with CodeProject AI to detect when people or animals are in front of my house. Works well with Blue Iris (NVR software for security cameras). I like it. That's all the self-hosted AI I've got for now.

[–] [email protected] 5 points 3 months ago (1 children)

What kinds of specs do you need to run it well? I've got a laptop with a 3070.

[–] [email protected] -3 points 3 months ago* (last edited 3 months ago) (3 children)

You probably want 48gb of vram or more to run the good stuff. I recommend renting GPU time instead of using your own hardware, via AWS or other vendors - runpod.io is pretty good.

[–] [email protected] 4 points 3 months ago

Llama3 8b can be run at 6gb vram, and it's fairly competent. Gemma has a 9b I think, which would also be worth looking into.

[–] [email protected] 3 points 3 months ago (1 children)

IDK, looks like 48GB cloud pricing would be 0.35/hr => $255/month. Used 3090s go for $700. Two 3090s would give you 48GB of VRAM, and cost $1400 (I'm assuming you can do "model-parallel" will Llama; never tried running an LLM, but it should be possible and work well). So, the break-even point would be <6 months. Hmm, but if Severless works well, that could be pretty cheap. Would probably take a few minutes to process and load a ~48GB model every cold start though?

[–] [email protected] 1 points 2 months ago

Assuming they already own a PC, if someone buys two 3090 for it they'll probably also have to upgrade their PSU so that might be worth including in the budget. But it's definitely a relatively low cost way to get more VRAM, there are people who run 3 or 4 RTX3090 too.

[–] [email protected] 6 points 3 months ago

Kinda defeats the purpose of doing it private and local.

I wouldn't trust any claims a 3rd party service makes with regards to being private.

[–] [email protected] 28 points 3 months ago (1 children)

Uncensored models are so much better, too. chatGPT is like one of those plastic children's toy hammers vs real models are titanium hammers

[–] [email protected] 6 points 3 months ago

Together.ai has a number of uncensored models too. I’ve found that those are so cheap that it’s not worth trying to self just models unless you really need more privacy.

[–] [email protected] 1 points 3 months ago (1 children)

I switched from OpenWebUI to Alpaca as I had no use for multi accounts.

[–] [email protected] 2 points 3 months ago (1 children)

Open WebUI now has a docker environment variable so you can, by default, turn off the login page. You just declare it when you’re spinning up the container and you’re good to go.

[–] [email protected] 2 points 3 months ago

I like libadwaita which Alpaca has

load more comments
view more: next ›