Blaed

joined 2 years ago
MODERATOR OF
 

Meta has released and open-sourced Llama 3.1 in three different sizes: 8B, 70B, and 405B

This new Llama iteration and update brings state-of-the-art performance to open-source ecosystems.

If you've had a chance to use Llama 3.1 in any of its variants - let us know how you like it and what you're using it for in the comments below!

Llama 3.1 Megathread

For this release, we evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, we performed extensive human evaluations that compare Llama 3.1 with competing models in real-world scenarios. Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, our smaller models are competitive with closed and open models that have a similar number of parameters.

As our largest model yet, training Llama 3.1 405B on over 15 trillion tokens was a major challenge. To enable training runs at this scale and achieve the results we have in a reasonable amount of time, we significantly optimized our full training stack and pushed our model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale.


Official Meta News & Documentation

See also: The Llama 3 Herd of Models paper here:


HuggingFace Download Links

8B

Meta-Llama-3.1-8B

Meta-Llama-3.1-8B-Instruct

Llama-Guard-3-8B

Llama-Guard-3-8B-INT8


70B

Meta-Llama-3.1-70B

Meta-Llama-3.1-70B-Instruct


405B

Meta-Llama-3.1-405B-FP8

Meta-Llama-3.1-405B-Instruct-FP8

Meta-Llama-3.1-405B

Meta-Llama-3.1-405B-Instruct


Getting the models

You can download the models directly from Meta or one of our download partners: Hugging Face or Kaggle.

Alternatively, you can work with ecosystem partners to access the models through the services they provide. This approach can be especially useful if you want to work with the Llama 3.1 405B model.

Note: Llama 3.1 405B requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing.

Learn more at:


Running the models

Linux

Windows

Mac

Cloud


More guides and resources

How-to Fine-tune Llama 3.1 models

Quantizing Llama 3.1 models

Prompting Llama 3.1 models

Llama 3.1 recipes


YouTube media

Rowan Cheung - Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more

Matthew Berman - BREAKING: LLaMA 405b is here! Open-source is now FRONTIER!

Wes Roth - Zuckerberg goes SCORCHED EARTH.... Llama 3.1 BREAKS the "AGI Industry"*

1littlecoder - How to DOWNLOAD Llama 3.1 LLMs

Bloomberg - Inside Mark Zuckerberg's AI Era | The Circuit

 

Hello everyone. Today I'd like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization).

Included with the paper are a few open-source projects and code repos that support DPO training. If you are fine-tuning models, this is worth looking into!

DPO Arxiv Paper

Try Fine-tuning w/ DPO using Axolotl

Try Fine-tuning w/ DPO using Llama Factory

Try Fine-tuning w/DPO using Unsloth

Now.. onto the paper!

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF).

However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model.

In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss.

The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.

Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.

Figure 1: DPO optimizes for human preferences while avoiding reinforcement learning. Existing methods for fine-tuning language models with human feedback first fit a reward model to a dataset of prompts and human preferences over pairs of responses, and then use RL to find a policy that maximizes the learned reward. In contrast, DPO directly optimizes for the policy best satisfying the preferences with a simple classification objective, fitting an implicit reward model whose corresponding optimal policy can be extracted in closed form

Figure 2: Left. The frontier of expected reward vs KL to the reference policy. DPO provides the highest expected reward for all KL values, demonstrating the quality of the optimization.

Right. TL;DR summarization win rates vs. human-written summaries, using GPT-4 as evaluator. DPO exceeds PPO’s best-case performance on summarization, while being more robust to changes in the sampling temperature.

Learning from preferences is a powerful, scalable framework for training capable, aligned language models. We have introduced DPO, a simple training paradigm for training language models from preferences without reinforcement learning.

Rather than coercing the preference learning problem into a standard RL setting in order to use off-the-shelf RL algorithms, DPO identifies a mapping between language model policies and reward functions that enables training a language model to satisfy human preferences directly, with a simple cross-entropy loss, without reinforcement learning or loss of generality.

With virtually no tuning of hyperparameters, DPO performs similarly or better than existing RLHF algorithms, including those based on PPO; DPO thus meaningfully reduces the barrier to training more language models from human preferences.

Our results raise several important questions for future work. How does the DPO policy generalize out of distribution, compared with learning from an explicit reward function?

Our initial results suggest that DPO policies can generalize similarly to PPO-based models, but more comprehensive study is needed. For example, can training with self-labeling from the DPO policy similarly make effective use of unlabeled prompts? On another front, how does reward over-optimization manifest in the direct preference optimization setting, and is the slight decrease in performance in Figure 3-right an instance of it?

Additionally, while we evaluate models up to 6B parameters, exploration of scaling DPO to state-of-the-art models orders of magnitude larger is an exciting direction for future work. Regarding evaluations, we find that the win rates computed by GPT-4 are impacted by the prompt; future work may study the best way to elicit high-quality judgments from automated systems. Finally, many possible applications of DPO exist beyond training language models from human preferences, including training generative models in other modalities.

Read More

 

Hello everyone, I have another exciting Mamba paper to share. This being an MoE implementation of the state space model.

For those unacquainted with Mamba, let me hit you with a double feature (take a detour checking out these papers/code if you don't know what Mamba is):

Now.. onto the MoE paper!

MoE-Mamba

Efficient Selective State Space Models with Mixture of Experts

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based LLMs, including recent state-of-the-art open-source models.

We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable, Transformer-like performance.

Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer.

Category Hyperparameter Value
Model Total Blocks 8 (16 in Mamba)
dmodel 512
Feed-Forward df f 2048 (with Attention) or 1536 (with Mamba)
Mixture of Experts dexpert 2048 (with Attention) or 1536 (with Mamba)
Experts 32
Attention nheads 8
Training Training Steps 100k
Context Length 256
Batch Size 256
LR 1e-3
LR Warmup 1% steps
Gradient Clipping 0.5

MoE seems like the logical way to move forward with Mamba, at this point, I'm wondering could there anything else holding it back? Curious to see more tools and implementations compare against some of the other trending transformer-based LLM stacks.

 

Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven't read it already.

Mamba

Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.

Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language.

We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements.

First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token.

Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba).

Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences.

As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.

On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

(...) Mamba achieves state-of-the-art results on a diverse set of domains, where it matches or exceeds the performance of strong Transformer models. We are excited about the broad applications of selective state space models to build foundation models for different domains, especially in emerging modalities requiring long context such as genomics, audio, and video. Our results suggest that Mamba is a strong candidate to be a general sequence model backbone.

What are your thoughts on Mamba?

 

I don't think this has been shared here before. Figured now is as good time as ever.

I'd like to share with everyone Open Interpreter.

Open Interpreter

Check it out here: https://github.com/KillianLucas/open-interpreter

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer's general-purpose capabilities:

  • Create and edit photos, videos, PDFs, etc.
  • Control a Chrome browser to perform research
  • Plot, clean, and analyze large datasets
  • ...etc. ⚠️ Note: You'll be asked to approve code before it's run.

Comparison to ChatGPT's Code Interpreter

OpenAI's release of Code Interpreter with GPT-4 presents a fantastic opportunity to accomplish real-world tasks with ChatGPT.

However, OpenAI's service is hosted, closed-source, and heavily restricted:

  • No internet access.
  • Limited set of pre-installed packages.
  • 100 MB maximum upload, 120.0 second runtime limit.
  • State is cleared (along with any generated files or links) when the environment dies.

Open Interpreter overcomes these limitations by running in your local environment. It has full access to the internet, isn't restricted by time or file size, and can utilize any package or library.

This combines the power of GPT-4's Code Interpreter with the flexibility of your local development environment.

Open Interpreter Roadmap

 

There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver?

Personally, I am using many Mistral/Mixtral models and a few random OpenHermes fine-tunes for flavor. I was also pleasantly surprised by some of the DeepSeek models. Those were fun to test.

I believe 2024 is the year open-source LLMs will catchup with GPT-3.5 and GPT-4. We're already most of the way there. Curious to hear what new contenders are on the block and how others feel about their performance/precision compared to other state-of-the-art (closed) source models.

 

Hello everyone.

I'm back!

To anyone still reading - I hope you have been enjoying the rapid amount of progress we've seen in the space since my hiatus.

You'll be happy to hear I'm going to be periodically cleaning up some of the outdated resources in favor of new, updated documentation both on our frontpage and on our sidebar.

I know I also promised you all official FOSAI models on HuggingFace. I did not forget. Those are still in the pipeline. More info on that and other updates coming soon.

In the meantime, is there anything in terms of guides, resources, or notes that you'd like to see in particular? Let me know in the comments and I'll see where it might fit on the list.

Cheers!

Blaed

1
submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]
 

Hello everyone,

After some time away I have come to the realization that I have been neglecting a few personal projects and responsibilities by prioritizing staying in the know (over building / working towards other goals I set out to accomplish before 2024).

That being said, I decided it would be in my best interest to take a brief hiatus throughout the remainder of the year to tackle these tasks before they get out of hand (and no longer become a reality). I will be sharing notes here and there, but at much less frequency due to the work I'll be doing.

Some of these projects are resources for this community, others are totally different obligations I need to attend to.

You will be informed of the important updates, but I will be working mostly in the shadows - waiting and watching for the right moments to emerge.

On my long list of tasks is still getting our own fosai model on HuggingFace, which was going well until I ran out of funds. As much as I'd love to, it is no longer sustainable for me to keep paying out-of-pocket for fosai fine-tuning expenses.. lol.

I had a Mistral-7B fine-tune that almost completed its training - but failed at the final 4%. I had the adapter and weights semi-published, but they were unusable from whatever caused that hiccup. That's okay though, I will be applying for grants to help get this training workflow back off the ground (this time, with those pesky GPU costs covered).

If all else fails, I will turn to other methods.

I want you to know that throughout this hiatus, I am leaving the community to you guys. I want to let [email protected] organically grow (or slow) without my intervention. At the end of the day, I probably shouldn't be the only one sharing content here. I'm curious to see who sticks around and who does (or doesn't) post in my absence.

Shoutout everyone who has been sharing content, it does not go unnoticed. At least by me.

Whether content creator or casual lurker - you should know the activity of this community is not something I put a ton of expectations on so don't pressure yourself to try and keep this community 'alive' with content or comments if it doesn't feel natural or genuine. This community is not going anywhere, I'm just taking a break. We have already succeeded at the original fosai goal I set out to achieve. Now we must spend time building and developing our futures - collectively, and individually.

If you've been here since the beginning - thank you for reading and sticking around, but perhaps this is a good time for you to take a break from the AI news cycle too. This applies to everyone really, but it especially applies to all of you here. There was much innovation throughout the year and much more yet to come. If your FOMO is getting the best of you, consider subscribing to the YouTube content creators I've listed in this README. Otherwise, take a break, play some games, touch some grass or do something for yourself (and not for the sake of you thinking it needs to get done).

We'll be here for all of the future's wildest creations in this space, but taking a moment to develop yourself, be with family, (or spend time on one of your projects) is something you should consider doing if you have the ability to do so - no matter the pace of innovation. This is something I have forgotten, and something I will be reminding myself these coming weeks.

The future is now. The future is bright. The future is H.

Blaed

 

🤖 Happy FOSAI Friday! 🚀

Friday, October 6, 2023

HyperTech News Report #0003

Hello Everyone!

This week highlights a wave of new papers and frameworks that expand upon LLM functionalities. With a tsunami of applications on the horizon I foresee a bedrock of tools to preceed. I'm not sure what kits and processes will end up part of this bedrock, but I hope some of these methods end up interesting or helpful to your workflow!

Table of Contents

Community Changelog

Image of the Week

This image of the week comes from one of my own projects! I hope you don't mind me sharing.. I was really happy with this result. This was generated from an SDXL model I trained and host on Replicate. I use an mock ensemble approach to generate various game assets for an experimental roguelike I'm making with a colleague.

My current method is not at all efficient, but I have fun. Right now, I have three SDXL models I interact with, each generating art I can use for my project. Andraxus takes care of wallpapers and in-game levels (this image you're seeing here), his in-game companion Biazera imagines characters and entities of this world, while Cerephelo tinkers and toils over the machinations within - crafting items, loot, powerups, etc.

I've been hesitant self-promoting here. But if there's genuine interest in this project I would be more than happy sharing more details. It's still in pre-alpha development, but there were plans releasing all of the models we use as open-source (obviously). We're still working on the engine though. Let me know if you want to see more on this project.


News


  1. Arxiv Publications Workflow: A new workflow has been introduced that allows users to scrape search topics from Arxiv, converting the results into markdown (MD) format. This makes it easier to digest and understand topics from Arxiv published content. The tool, available on GitHub, is particularly useful for those who wish to delve deeper into research papers and run their own research processes.

  2. Texting LLMs from Your Phone: A guide has been shared that enables users to communicate with their personal assistants via simple text messages. The process involves setting up a Twilio account, purchasing and registering a phone number, and then integrating it with the Replicate platform. The code, available on GitHub, makes it possible to send and receive messages from LLMs directly on one's phone.

  3. Microsoft's AutoGen: Microsoft has released AutoGen, a tool designed to aid in the creation of autonomous LLM agents. Compatible with ChatGPT models, AutoGen facilitates the development of LLM applications using multiple agents that can converse with each other to solve tasks. The framework is customizable and allows for seamless human participation. More details can be found on GitHub.

  4. Promptbench and ACE Framework: Promptbench is a new project focused on the evaluation and benchmarking of models. Stemming from the DyVal paper, it aims to provide reliable insights into model performance. On the other hand, the ACE Framework, designed for autonomous cognitive entities, offers a unique approach to agent tooling. While still in its early stages, it promises to bring about innovative implementations in the realms of personal assistants, game world NPCs, autonomous employees, and embodied robots.

  5. Research Highlights: Several papers have been published that delve into the intricacies of LLMs. One paper introduces a method to enhance the zero-shot reasoning abilities of LLMs, while another, titled DyVal, proposes a dynamic evaluation protocol for LLMs. Additionally, the concept of Low-Rank Adapters (LoRA) ensembles for LLM fine-tuning has been explored, emphasizing the potential of using one model and dynamically swapping the fine-tuned QLoRA adapters.


Tools & Frameworks


Keep Up w/ Arxiv Publications

Due to a drastic change in personal and work schedules, I've had to shift how I research and develop posts and projects for you guys. That being said, I found this workflow from the same author of the ACE Framework particularly helpful. It scrapes a search topic from Arxiv and returns a massive XML that is converted to markdown (MD) to then be used as an injectable context report for a LLM of your choosing (to further break down and understand topics) or as a well of information for the classic CTRL + F search. But at this point, info is aggregated (and human readable) from Arxiv published content.

After reading abstractions you can further drill into each paper and dissect / run your own research processes as you see fit. There is definitely more room for automation and organization here I'm sure, but this has been a big resource for me lately so I wanted to proliferate it for others who might find it helpful too.

Text LLMs from Your Phone

I had an itch to make my personal assistants more accessible - so I started investigating ways I could simply text them from my iPhone (via simple sms). There are many other ways I could've done this, but texting has been something I always like to default to in communications. So, I found this cool guide that uses infra I already prefer (Replicate) and has a bonus LangChain integration - which opens up the door to a ton of other opportunities down the line.

This tutorial was pretty straightforward - but to be honest, making the Twilio account, buying a phone number (then registering it) took the longest. The code itself takes less than 10 minutes to get up and running with ngrok. Super simple and straightforward there. The Twilio process? Not so much.. but it was worth the pain!

I am still waiting on my phone number to be verified (so that the Replicate inference endpoint can actually send SMS back to me) but I ended the night successfully texting the server on my local PC. It was wild texting the Ahsoka example from my phone and seeing the POST response return (even though it didn't go through SMS I could still see the server successfully receive my incoming message/prompt). I think there's a lot of fun to be had giving casual phone numbers and personalities to assistants like this. Especially if you want to LangChain some functions beyond just the conversation. If there's more interest on this topic, I can share how my assistant evolves once it gets full access to return SMS. I am designing this to streamline my personal life, and if it proves to be useful I will absolutely release the project as open-source.

AutoGen

With Agents on the rise, tools and automation pipelines to build them have become increasingly more important to consider. It seems like Microsoft is well aware of this, and thus released AutoGen, a tool to help enable this automation tooling and creation of autonomous LLM agents. AutoGen is compatible with ChatGPT models and is being kitted for local LLMs as we speak.

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

Promptbench

I recently found promptbench - a project that seems to have stemmed from the DyVal paper (shared below). I for one appreciate some of the new tools that are releasing focused around the evaluation and benchmarking of models. I hope we continue to see more evals, benchmarks, and projects that return us insights we can rely upon.

ACE Framework

A new framework has been proposed and designed for autonomous cognitive entities. This appears similar to agents and their style of tooling, but with a different architecture approach? I don't believe implementation of this is ready, but it may be soon and something to keep an eye on.

There are many possible implementations of the ACE Framework. Rather than detail every possible permutation, here is a list of categories that we perceive as likely and viable.

Personal Assistant and/or Companion

  • This is a self-contained version of ACE that is intended to interact with one user.
  • Think of Cortana from HALO, Samantha from HER, or Joi from Blade Runner 2049. (yes, we recognize these are all sexualized female avatars)
  • The idea would be to create something that is effectively a personal Executive Assistant that is able to coordinate, plan, research, and solve problems for you. This could be deployed on mobile, smart home devices, laptops, or web sites.

Game World NPC's

  • This is a kind of game character that has their own personality, motivations, agenda, and objectives. Furthermore, they would have their own unique memories.
  • This can give NPCs a much more realistic ability to pursue their own objectives, which should make game experiences much more dynamic and unpredictable, thus raising novelty. These can be adapted to 2D or 3D game engines such as PyGame, Unity, or Unreal.

Autonomous Employee

  • This is a version of the ACE that is meant to carry out meaningful and productive work inside a corporation.
  • Whether this is a digital CSR or backoffice worker depends on the deployment.
  • It could also be a "digital team member" that primarily interacts via Discord, Slack, or Microsoft Teams.

Embodied Robot

The ACE Framework is ideal to create self-contained, autonomous machines. Whether they are domestic aid robots or something like WALL-E


Papers


Agent Instructs Large Language Models to be General Zero-Shot Reasoners

We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.

DyVal: Graph-informed Dynamic Evaluation of Large Language Models

Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs.

LoRA ensembles for large language model fine-tuning

Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.

There is something to be discovered between LoRA, QLoRA, and ensemble/MoE designs. I am digging into this niche because of an interesting bit I heard from sentdex (if you want to skip to the part I'm talking about, go to 13:58). Around 15:00 minute mark he brings up QLoRA adapters (nothing new) but his approach was interesting.

He eventually shares he is working on a QLoRA ensemble approach with skunkworks (presumably Boeing skunkworks). This confirmed my suspicion. Better yet - he shared his thoughts on how all of this could be done. Watch and support his video for more insights, but the idea boils down to using one model and dynamically swapping the fine-tuned QLoRA adapters. I think this is a highly efficient and unapplied approach. Especially in that MoE and ensemble realm of design. If you're reading this and understood anything I said - get to building! This is a seriously interesting idea that could yield positive results. I will share my findings when I find the time to dig into this more.


Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

[–] [email protected] 0 points 2 years ago

Mistral seems to be the popular choice. I think it's the most open-source friendly out of the bunch. I will keep function calling in mind as I design some of our models! Thanks for bringing that up.

 

Hey everyone!

I think it's time we had a fosai model on HuggingFace. I'd like to start collecting ideas, strategies, and approaches for fine-tuning our first community model.

I'm open to hearing what you think we should do. We will release more in time. This is just the beginning.

For now, I say let's pick a current open-source foundation model and fine-tune on datasets we all curate together, built around a loose concept of using a fine-tuned LLM to teach ourselves more bleeding-edge technologies (and how to build them using technical tools and concepts).

FOSAI is a non-profit movement. You own everything fosai as much as I do. It is synonymous with the concept of FOSS. It is for everyone to champion as they see fit. Anyone is welcome to join me in training or tuning using the workflows I share along the way.

You are encouraged to leverage fosai tools to create and express ideas of your own. All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.


We're Building FOSAI Models! 🤖

Our goal is to fine-tune a foundation model and open-source it. We're going to start with one foundation family with smaller parameters (7B/13B) then work our way up to 40B (or other sizes), moving to the next as we vote on what foundation we should fine-tune as a community.


Fine-Tuned Use Case ☑️

Technical

  • FOSAI Model Idea #1 - Research & Development Assistant
  • FOSAI Model Idea #2 - Technical Project Manager
  • FOSAI Model Idea #3 - Personal Software Developer
  • FOSAI Model Idea #4 - Life Coach / Teacher / Mentor
  • FOSAI Model Idea #5 - FOSAI OS / System Assistant

Non-Technical

  • FOSAI Model Idea #6 - Dungeon Master / Lore Master
  • FOSAI Model Idea #7 - Sentient Robot Character
  • FOSAI Model Idea #8 - Friendly Companion Character
  • FOSAI Model Idea #9 - General RPG or Sci-Fi Character
  • FOSAI Model Idea #10 - Philosophical Character

OR

FOSAI Foundation Model ☑️


Foundation Model ☑️

(Pick one)

  • Mistral
  • Llama 2
  • Falcon
  • ..(Your Submission Here)

Model Name & Convention

  • snake_case_example
  • CamelCaseExample
  • kebab-case-example

0.) FOSAI ☑️

  • fosai-7B
  • fosai-13B

1.) FOSAI Assistant ☑️

  • fosai-assitant-7B
  • fosai-assistant-13B

2.) FOSAI Atlas ☑️

  • fosai-atlas-7B
  • fosai-atlas-13B

3.) FOSAI Navigator ☑️

  • fosai-navigator-7B
  • fosai-navigator-13B

4.) ?


Datasets ☑️

  • TBD!
  • What datasets do you think we should fine-tune on?

Alignment ☑️

To embody open-source mentalities, I think it's worth releasing both censored and uncensored versions of our models. This is something I will consider as we train and fine-tune over time. Like any tool, you are responsible for your usage and how you choose to incorporate into your business and/or personal life.


License ☑️

All fosai models will be licensed under Apache 2.0. I am open to hearing thoughts if other licenses should be considered.

This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.

Generally speaking, you can expect that all fosai models will be commercially viable through the selection process of its foundation family and the post-processing steps that are fine-tuning the model.


Costs

I will be personally covering all training and deployment costs. This may change if I choose to put together some sort of patronage, but for now - don't worry about this. I will be using something like RunPod or some other custom deployed solution for training.


Cast Your Votes! ☑️

Share Your Ideas & Vote in the Comments Below! ✅

What do you want to see out of this first community model? What are some of the fine-tuning ideas you've wanted to try, but never had the time or chance to test? Let me know in the comments and we'll brainstorm together.

I am in no rush to get this out, so I will leave this up for everyone to see and interact with until I feel we have a solid direction we can all agree upon. There will be plenty of more opportunities to create, curate, and customize more fosai models I plan to release in the future.

Update [10/25/23]: I may have found a fine-tuning workflow for both Llama (2) and Mistral, but I haven't had any time to validate the first test run. Once I have a chance to do this and test some inference I'll be updating this post with the workflow, the models, and some sample output with example datasets. Unfortunately, I have ran out of personal funds to allocate to training, so it is unsure when I will have a chance to make another attempt at this if this first attempt doesn't pan out. Will keep everyone posted as we approach the end of 2023.

 

Hey everyone!

I don't think I've shared this one before, so allow me to introduce you to 'LM Studio' - a new application that is tailored to LLM developers and enthusiasts.

Check it out!


With LM Studio, you can ...

🤖 - Run LLMs on your laptop, entirely offline

👾 - Use models through the in-app Chat UI or an OpenAI compatible local server

📂 - Download any compatible model files from HuggingFace 🤗 repositories

🔭 - Discover new & noteworthy LLMs in the app's home page LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc.)

Minimum requirements: M1/M2 Mac, or a Windows PC with a processor that supports AVX2. Linux is under development.

Made possible thanks to the llama.cpp project.

We are expanding our team. See our careers page.


Love seeing these new tools come out! Especially with the new gguf format being widely adopted.

The regularly updated and curated list of new LLM releases they provide through this platform is enough for me to keep it installed.

I'll be tinkering plenty when I have the time this week. I'll be sure to let everyone know how it goes! In the meantime, if you do end up giving LM Studio a try - let us know your thoughts and experience with it in the comments below.

[–] [email protected] 1 points 2 years ago (1 children)

After finally having a chance to test some of the new Llama-2 models, I think you're right. There's still some work to be done to get them tuned up... I'm going to dust off some of my notes and get a new index of those other popular gen-1 models out there later this week.

I'm very curious to try out some of these docker images, too. Thanks for sharing those! I'll check them when I can. I could also make a post about them if you feel like featuring some of your work. Just let me know!

 

cross-posted from: https://lemmy.world/post/2219010

Hello everyone!

We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of [email protected]. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.

It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.

The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!

In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.


Getting Started With Free Open-Source AI

Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.

When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.

If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.

Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).

8-bit System Requirements

Model VRAM Used Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB
LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB
LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB
LLaMA-65B 74GB 80GB A100 80GB 128 GB

4-bit System Requirements

Model Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB
LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB
LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB
LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

When in doubt, try starting with 3B or 7B models and work your way up to 13B+.

FOSAI Resources

Fediverse / FOSAI

LLM Leaderboards

LLM Search Tools


Large Language Model Hub

Download Models

oobabooga

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.


Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke here.


70B


30B


13B


7B


More Models


GL, HF!

Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.

If you haven't already, consider subscribing to the free open-source AI community at [email protected] where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.

Thank you for reading!

view more: next ›