ThreeJawedChuck

joined 1 month ago
[–] [email protected] 0 points 3 weeks ago (1 children)

Will do, thanks for the tip. Your description does sound like a good fit for the idea. As long as it supports network inference between machines with heterogeneous cards, it would work for what I have in mind.

[–] [email protected] 0 points 1 month ago* (last edited 1 month ago)

To add to my lame noob answer, I found this, which has a better rundown of ollama vs llama.cpp. I don't know if it's considered bad form to link to ##ddit on lemmy, so ~~I'll just put the title here and you can search for it on there if you want~~ link added per comment from mutual_ayed below. There are a couple informative posts which are upvoted. "There is a big difference between use LM-Studio, Ollama, LLama.cpp?"

 

Hey everybody. I'm just getting into LLMs. Total noob. I started using llama-server's web interface, but I'm experimenting with a frontend called SillyTavern. It looks much more powerful, but there's still a lot I don't understand about it, and some design choices I found confusing.

I'm trying the Harbinger-24B model to act as a D&D-style DM, and to run one party character while I control another. I tried several general purpose models too, but I felt the Harbinger purpose-built adventure model was noticeably superior for this.

I'll write a little about my experience with it, and then some thoughts about LLMs and D&D. (Or D&D-ish. I'm not fussy about the exact thing, I just want that flavour of experience).

General Experience

I've run two scenarios. My first try was a 4/10 for my personal satisfaction, and the 2nd was 8/10. I made no changes to the prompts or anything between, so that's all due to the story the model settled into. I'm trying not to give the model any story details, so it makes everything up, and I won't know about it in advance. The first story the model invented was so-so. The second was surprisingly fun. It had historical intrigue, a tie-in to a dark family secret from ancestors of the AI-controlled char, and the dungeon-diving mattered to the overarching story. Solid marks.

My suggestion for others trying this is, if you don't get a story you like out of the model, try a few more times. You might land something much better.

The Good

Harbinger provided a nice mixture of combat and non-combat. I enjoy combat, but I also like solving mysteries and advancing the plot by talking to NPCs or finding a book in the town library, as long as it feels meaningful.

It writes fairly nice descriptions of areas you encounter, and thoughts for the AI-run character.

It seems to know D&D spells and abilities. It lets you use them in creative but very reasonable ways you could do in a pen and paper game, but can't do in a standard CRPG engine. It might let you get away with too much, so you have to keep yourself honest.

The Bad

You may have to try multiple times until the RNG gives you a nice story. You could also inject a story in the base prompt, but I want the LLM to act as a DM for me, where I'm going in completely blind. Also, in my first 4/10 game, the LLM forced really bad "main character syndrome" on me. The whole thing was about me, me, me, I'm special! I found that off putting, but the 2nd 8/10 attempt wasn't like that at all.

As an LLM, it's loosy-goosy about things like inventory, spells, rules, and character progression.

I had a difficult time giving the model OOC instructions. OOC tended to be "heard" by other characters.

Thoughts about fantasy-adventure RP and LLMs

I feel like the LLM is very good at providing descriptions, situations, and locations. It's also very good at understanding how you're trying to be creative with abilities and items, and it lets you solve problems in creative ways. It's more satisfying than a normal CRPG engine in this way.

As an LLM though, it let you steer things in ways you shouldn't be able to in an RPG with fixed rules. Like disallowing a spell you don't know, or remembering how many feet of rope you're carrying. I enjoy the character leveling and crunchy stats part of pen-and-paper or CRPGs, and I haven't found a good way to get the LLM to do that without just handling everything manually and whacking it into the context.

That leads me to think that using an LLM for creativity inside a non-LLM framework to enforce rules, stats, spells, inventory, and abilities might be phenomenal. Maybe AI-dungeon does that? Never tried, and anyway I want local. A hybrid system like that might be scriptable somehow, but I'm too much of a noob to know.

[–] [email protected] 0 points 1 month ago

What’s the advantage over Ollama?

I'm very new to this so someone more knowledgeable should probably answer this for real.

My impression was that ollama somehow uses the llama.cpp source internally, but wraps it up to provide features like auto-downloading of models. I didn't care about that, but I liked the very tiny dependency footprint of llama.cpp. I haven't tried ollama for network inference.

There are other backends too which support network inference, and some posts allege they are better for that than llama.cpp is. vllm and ... exllama or something like that? I haven't looked into either of them. I'm running on inertia so far with llama.cpp, since it was so easy to get going and I'm kinda lazy.

[–] [email protected] 0 points 1 month ago

I like this project. Very nice!

I haven't tried RAG yet, nor the fancy vector space whatsit which looks like it requires a specialized model(?) to create. I've been wanting to do something similar in spirit to your project here, but for an online RPG, so I dig this.

 

Hey everybody, brand new to running local LLMs, so I'm learning as I go. Also brand new to lemmy.

I have a 16 GB VRAM card, and I was running some models that would overflow 16GB by using the CPU+RAM to run some of the layers. It worked, but was very slow, even for only a few layers.

Well I noticed llama.cpp has an rpc-server feature, so I tried it. It was very easy to use. Lin here, but probably similar on Win or Mac. I had an older gaming rig sitting around with a GTX 1080 in it. Much slower than my 4080, but using it to run a few layers is still FAR faster than using the CPU. Night and day almost.

The main drawbacks I've experienced so far are,

  • By default it tries to split the model evenly between machines. That's fine if you have the same card in all of them, but I wanted to put as much of the model as possible on the fastest card. You can do that using the --tensor-split parameter, but it requires some experimenting to get it right.

  • It loads the rpc machine's part of the model across the network every time you start the server, which can be slow on 1 gigabit network. I didn't see any way to tell rpc-server to load the model from a local copy. It makes my startups go from 1-2 seconds, up to like 30-50 sec.

  • Q8 quantized KV cache works, but Q4 does not.

Lots of people may not be able to run 2 or 3 GPUs in one PC, but might have another PC they can add over the network. Worth a try, I'd say, if you want more VRAM space.