Does anyone know how the base (/foundation) model works? Up until now they always released one instruction tuned variant and one base model. Is it the same for the 405B model? And if yes, does that base model refuse to do things? Because I read some people claiming the new Llama 3.1 is more restricted than the versions before. But this shouldn't apply to a base model. It's just the instruct-tuned variants that are aligned to some "guardrails". I'm confused. Do people use the wrong model? Or has something changed?
Free Open-Source Artificial Intelligence
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
IMO guardrails have been irrelevant for "local" models forever since a little prompt engineering or manipulation blows them away,.
In theory the base model should be less "censored," but really its just for raw completion/continuation and further finetuning.
But it's super annoying when doing storywriting or using it as an agent. And then you have to do detection and extra handling of refusals, circumvent them and write extra prompts. And I think I read some paper that jailbreaking and removing "censorship" tends to make the models a bit stupider. I think in general it's way more clever to take a model without guardrails and fine-tune it, than to put them in place and then remove them again, degrade the model in the process and also make your life harder. A base model should be entirely without any censorship. (It's a base model though. It obviously won't follow instructions or answer questions... It's the basis for the community to take and fine-tune, aligned with our vision of baked-in ethics or the lack thereof.)
Yeah, well, I have been using base models and a few instruct tunes for a bit and haven't even gotten refusals, as long as there as enough existing context.
I tried downloading and running the 405B locally through LM-Studio. Got an error message saying invalid tokenizer. Then tried it with ollama. That didn't work either. Going to try the 70B tomorrow.
Not sure it's possible to run the larger ones on a Mac laptop.
There's apparently some tuning that needs to be done in Llama.cpp (which LM Studio uses to run) so Llama 3.1 can work properly: https://github.com/ggerganov/llama.cpp/issues/8650
Thank you. Looks like I'm not alone and people are doing more detailed testing. I'll just wait till the dust settles.
super exciting, but in a way i have kind of "lost interest" in frontier models, since the resources needed to run them is beyond what most people have access to. i mostly see the future in smaller models (like 3.1 8B for example), anyone else share this feeling?
also unrelated but, i was previously librecat on here (my last instance stopped working)
Agreed - 8b has enough magic to hold a conversation and do small tasks, such as breaking up a large task or picking out key details, which can then be fed into more small models (maybe even more narrowly fine-tuned ones)
180b isn't enough to replace all the other pieces of a system that you need for autonomous action or memory
I think 8b models are enough to make AGI possible if we stack them just right. They're enough to fill in most of the gaps to make practical things too, and they're not that far off for everything else