this post was submitted on 24 Jun 2025
110 points (86.7% liked)

Selfhosted

48681 readers
2485 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I've just re-discovered ollama and it's come on a long way and has reduced the very difficult task of locally hosting your own LLM (and getting it running on a GPU) to simply installing a deb! It also works for Windows and Mac, so can help everyone.

I'd like to see Lemmy become useful for specific technical sub branches instead of trying to find the best existing community which can be subjective making information difficult to find, so I created [email protected] for everyone to discuss, ask questions, and help each other out with ollama!

So, please, join, subscribe and feel free to post, ask questions, post tips / projects, and help out where you can!

Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 3 days ago* (last edited 3 days ago) (16 children)

Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

[–] [email protected] 1 points 2 days ago (11 children)

I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google's Gemini as my go-to LLM. I think I'd like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it's on my laptop, not my phone.

I'm not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I'm not big into the direction Firefox is headed, I've heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

I'm looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I'd give you as much info as I can.

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (3 children)

Actually, to go ahead and answer, the "fastest" path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

Hopefully one of these models, depending on how much RAM you have:

https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

And then use LM Studio (or some other MLX backend, or even free online API models) as the 'engine'

Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I'd have to look around.

[–] [email protected] 2 points 2 days ago (1 children)

This is all new to me, so I'll have to do a bit of homework on this. Thanks for the detailed and linked reply!

[–] [email protected] 3 points 2 days ago* (last edited 2 days ago) (1 children)

I was a bit mistaken, these are the models you should consider:

https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ

https://huggingface.co/AnteriorAI/gemma-3-4b-it-qat-q4_0-gguf

https://huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)

they are state-of-the-art at this size, as far as I know.

[–] [email protected] 2 points 2 days ago

Awesome, I'll give these a spin and see how it goes. Much appreciated!

load more comments (1 replies)
load more comments (8 replies)
load more comments (12 replies)