Fallen Gemma. The writing style is really good and it can keep relatively persistent personalities. On the other hand it's stupid af compared to other recent models and even the vanilla Gemma 3.
LocalLLaMA
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
QWQ-32B for most questions, llama-3.1-8B for agents. I'm looking for new models to replace them though, especially the agent one.
Want to test the new GLM models, but I'd rather wait for llama.cpp to definitely fix the bugs with them first.
Want to test the new GLM models
Which models are you referring to? These: https://github.com/THUDM/GLM-4 ?
That's the ones, the 0414 release.
GLM? I feel like every other day there is a new abbreviation :(
I have been using deephermes daily. I think CoT reasoning is so awesome and such a game changer! It really helps the model give better answers especially for hard logical problems. But I don't want it all the time especially on an already slow model. Being able to turn it on and off wirhout switching models is awesome. Mistral 24b deephermes is relatively uncensored, powerful and not painfully slow on my hardware. a high quant of llama 3.1 8b deephermes is able to fit entirely on my 8gb vram.
I find that for the purpose of my projects (narrative building, tabletop rpg simulation) gemma3:14b (with low temperature) works perfectly to create consistent psychological overviews.
I mainly use Llama-3-8B abliterated for everyday questions, and DeepSeek-Coder-V2-Lite for programming/Linux stuff.
Using DeepSeek-Coder-V2-Lite now, it's awesome!
I'm using this one because before they ceased the Open LLM Leaderboard, it was the highest rated 14B model that can run on a single GPU with 10GB VRAM.
Newbie here. I'm not sure if the documentation tells me if it can run with ollama. If I understand correctly you have to build it «by hand»? I mainly use ollama/models on the official website and I'm too scared to plunge deeper into the mechanics haha.
Not for GGUF comverted models.
Just run the following command in ollama
ollama run hf.co/wanlige/li-14b-v0.4-Q4_K_M-GGUF
That's awesome thank you.