Fine-tune a Llama-2 language model with a single instruction

robot teaching llama
Image generated with Bing Image Creator

Open-source large language models (LLM) such as Llama-2 and Mistral have become popular for their flexibility and ease of access. You can deploy them on your servers and computers and fine-tune them for your applications. A well-deployed Llama-2 model can save your organization time and money and help you meet data privacy requirements.

However, fine-tuning LLMs requires gathering and preparing data, writing the code for training the model, adjusting the hyperparameters, training and testing, and more. These challenges often make it difficult for people and organizations to create custom models.

To solve this problem, Matt Shumer, founder and CEO of OthersideAI, has created claude-llm-trainer, a tool that helps you fine-tune Llama-2 for a specific task with a single instruction. 

How to use claude-llm-trainer

Claude-llm-trainer is a Google Colab notebook that contains the code for fine-tuning Llama-2 7B for a specific task. To use claude-llm-trainer, all you need to do is configure the settings in the first cell, including a description of the task for which you want to train your model, the number of examples, and your Claude API key. (If you don’t have a Claude key, you can sign up for free account and create API keys on the Anthropic Console).

After that, run the cells sequentially until you reach the end of the notebook. The model will be saved in your Google Drive. You can now download and use the model on your servers.

The duration of generating training examples and fine-tuning the model will vary depending on your Colab and Claude subscriptions. Aside from the task configurations, here are a few basic things you can do to configure your trained model:

– The default base model of claude-llm-trainer is “NousResearch/llama-2-7b-chat-hf”. To change the model type, change the “model_name” variable in Cell 7 to the path of your desired model. For example, to use the Mistral-7B model, you can set it to “mistralai/Mistral-7B-Instruct-v0.2”

– To change the storage location of the trained model, set the “model_path” variable in Cell 10 to a Google Drive location of your choice.

– The process works well enough on the free Colab tier when you have 100 training examples and are fine-tuning one of the smaller models. But if you want to do heavier training on larger models, consider signing up for the Pro tier and use a stronger GPU. 

– If you have access to a stronger computer cluster such as Amazon AWS or Microsoft Azure, you can download the Colab as a notebook document and use it in your compute platform of choice. However, you will have to modify Cell 10 for your storage platform (Amazon S3, Azure Blob, local storage, etc.).

How claude-llm-trainer works

While claude-llm-trainer is very convenient, it does not necessarily fit all applications. To figure out whether it is suitable for your task, you should know how it works.

In Cell 2, claude-llm-trainer uses Claude 3 to generate the training examples. This process is called model distillation, where a strong model (e.g., GPT-4 or Claude 3), also called the “teacher,” is used to train a weaker model (e.g., Llama-2 or Mistral), known as the “student.” 

This is important to know because if the trainer model cannot accomplish your target task, then using model distillation will not be useful for your application. For example, if you want to fine-tune your LLM for a very special task that requires proprietary knowledge related to your company, then model distillation might not be very effective. The best way to find out is to experiment with the teacher model and see if it can accomplish the target task you want to train your LLM for. By default, claude-llm-trainer generates the examples with Haiku, the smallest and fastest of the Claude 3 family, as the teacher model. You can modify the teacher model in Cell 2.

In Cell 3, claude-llm-trainer uses Claude 3 Opus to generate a system prompt for the trained model. Once the cell is run, it will display the system prompt. You can run the cell multiple times if you are not satisfied with the system prompt. You will later need this system prompt when you use the fine-tuned LLM for the downstream task.

In Cells 4 and 5, the generated examples are compiled into a Pandas DataFrame and split into train and test sets. The tool also stores the datasets as json files in your notebook. You can download them for later use. (Note that files in your Colab environment will be deleted when the session ends.)

In Cell 6, the libraries needed for fine-tuning the LLM are downloaded. You might need to make changes to the versions of the libraries if you change the model you want to fine-tune. Cell 7 configures the hyperparameters for fine-tuning the model. This is where you can make further adjustments to improve the model’s training.

Cell 8 is where the real action takes place and the model is fine-tuned. You can track the progress and the model’s performance improvement on the train and test datasets as it goes through epochs of training. It is worth noting that claude-llm-training uses low-rank adaptation (LoRA) to train the model. LoRA is a technique that uses a subset of the parameters in the base model during the fine-tuning process, which is faster and more memory efficient.

In Cell 9, the trained model is tested with a sample command. Make sure to change the prompt to match your task.

In Cell 10, the LoRA adapter is merged into the main model and stored in your Google Drive. The final two cells contain the code needed to load the model and use it in your applications.

You can access the Colab project here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.