Before we begin to replicate DropBP in earnest, we recommend reading the 'Setup', 'Use the model', and 'Finetune the model' sections of the original code below. We will briefly explain this just for the implementation of DropBP.
1.1 Setup environment for lit-gpt
pip install -r requirements-all.txt
2.2 Setup environment for DropBP
cd ../
pip install -v -e .
For downloading LLaMA2, you should be able to access the weights of Hugging Face. It can be possible by following the steps at https://huggingface.co/meta-llama/Llama-2-7b. After access is granted, you can download LLaMA2 by using your HF hub token in https://huggingface.co/settings/tokens.
python scripts/download.py --repo_id meta-llama/Llama-2-7b-hf --access_token your_hf_token
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Llama-2-7b-hf
python scripts/prepare_alpaca.py \
--checkpoint_dir checkpoints/meta-llama/LLaMA-2-7b-hf \
--max_seq_length 512
- You can use
--drop_rate 0.5
for set target average drop rate of DropBP. - You can use
--is_sens_alloc True
for allowing sensitivity-based drop rates allocation of DropBP.
- full fine-tuning
python finetune/full.py \
--data_dir data/alpaca/ \
--checkpoint_dir checkpoints/meta-llama/Llama-2-7b-hf \
--out_dir out/alpaca/full \
--drop_rate 0.5 \
--is_sens_alloc True
--precision "bf16-true"
- LoRA
python finetune/lora.py \
--data_dir data/alpaca/ \
--checkpoint_dir checkpoints/meta-llama/Llama-2-7b-hf \
--out_dir out/alpaca/lora \
--drop_rate 0.5 \
--is_sens_alloc True
--precision "bf16-mixed"
- QLoRA
python finetune/lora.py \
--data_dir data/alpaca/ \
--checkpoint_dir checkpoints/meta-llama/Llama-2-70b-hf \
--out_dir out/alpaca/qlora \
--drop_rate 0.5 \
--is_sens_alloc True
--quantize "bnb.nf4"
- to evaluate on MMLU, use
--eval_tasks "["hendrycksTest-*"]"
- to evaluate on Commonsense reasoning, use
--eval_tasks "[arc_challenge, piqa, hellaswag, arc_easy, winogrande, openbookqa]"
- full fine-tuning
python eval/lm_eval_harness.py \
--checkpoint_dir checkpoints/meta-llama/Llama-2-7b-hf \
--model_dir out/alpaca/full/ \
--precision "bf16-true" \
--eval_tasks "["hendrycksTest-*"]" \
--save_filepath "out/alpaca/full/mmlu.json"
--quantize "bnb.nf4"
- LoRA
python eval/lora_harness.py \
--checkpoint_dir checkpoints/meta-llama/Llama-2-7b-hf \
--lora_path out/alpaca/lora/lit_model_finetuned.pth \
--precision "bf16-true" \
--eval_tasks "["hendrycksTest-*"]" \
--save_filepath "out/alpaca/lora/mmlu.json"
- QLoRA
python eval/lora_harness.py \
--checkpoint_dir checkpoints/meta-llama/Llama-2-70b-hf \
--lora_path out/alpaca/qlora/lit_model_finetuned.pth \
--precision "bf16-true" \
--eval_tasks "["hendrycksTest-*"]" \
--save_filepath "out/alpaca/qlora/mmlu.json"
After making the results file (i.e. mmlu.json), you can use eval/mmlu.py
and eval/csr.py
for easy analysis of the results as below.
python eval/mmlu.py \
--json_file "out/alpaca/full/mmlu.json" \
python eval/csr.py \
--json_file "out/alpaca/full/csr.json" \
- @LightningAI for lit-gpt
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX and the Evaluation Harness
- @TimDettmers for bitsandbytes
- @IST-DASLab for GPTQ
- @Microsoft for LoRA
- @tridao for Flash Attention 2
Below is Lit-GPT's original README. Note that we only intergrates our DropBP to ./finetune/lora.py
, ./finetune/full.py
, and ./lit_gpt/model.py
. Additionally, we add some code to the ./eval/
for easy evaluation.
Hackable implementation of state-of-the-art open-source large language models released under the Apache 2.0 license.
Supports the following popular model checkpoints:
Model and usage | Model size | Reference |
---|---|---|
EleutherAI Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | Biderman et al. 2023 |
LMSYS LongChat | 7B, 13B | LongChat Team 2023 |
LMSYS Vicuna | 7B, 13B, 33B | Li et al. 2023 |
Meta AI Code Llama | 7B, 13B, 34B | Rozière et al. 2023 |
Meta AI Llama 2 | 7B, 13B, 70B | Touvron et al. 2023 |
Mistral AI Mistral and Mixtral | 7B | Mistral website |
Microsoft Research Phi | 1.3B, 2.7B | Li et al. 2023 |
NousResearch Nous-Hermes | 7B, 13B, 70B | Org page |
OpenLM Research OpenLLaMA | 3B, 7B, 13B | Geng & Liu 2023 |
Platypus | 7B, 13B, 70B | Lee, Hunter, and Ruiz 2023 |
Stability AI StableCode | 3B | Stability AI 2023 |
Stability AI FreeWilly2 (Stable Beluga 2) | 70B | Stability AI 2023 |
Stability AI StableLM | 3B, 7B | Stability AI 2023 |
Stability AI StableLM Zephyr | 3B | Stability AI 2023 |
TII UAE Falcon | 7B, 40B, 180B | TII 2023 |
TinyLlama | 1.1B | Zhang et al. 2023 |
Together RedPajama-INCITE | 3B, 7B | Together 2023 |
Trelis Function Calling Llama 2 | 7B | Trelis et al. 2023 |
This implementation extends on Lit-LLaMA and nanoGPT, and it's powered by Lightning Fabric ⚡.
🏆 NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day
The Lit-GPT repository is the official starter kit for the NeurIPS 2023 LLM Efficiency Challenge, which is a competition focused on finetuning an existing non-instruction tuned LLM for 24 hours on a single GPU. The competition has two tracks, one for the A100 and another for the 4090 GPUs.
If you are interested in participating, you can learn more about the NeurIPS LLM Efficiency Challenge on the official website here. Also see the Lit-GPT NeurIPS Challenge Quickstart Guide for helpful tips.
The submission deadline is Oct 25th, 2023.
This repository follows the main principle of openness through clarity.
Lit-GPT is:
- Simple: Single-file implementation without boilerplate.
- Correct: Numerically equivalent to the original model.
- Optimized: Runs fast on consumer hardware or at scale.
- Open-source: No strings attached.
Avoiding code duplication is not a goal. Readability and hackability are.
Join our Discord to build high-performance, truly open-source models for the common benefit of the community.
Clone the repo:
git clone https://github.com/Lightning-AI/lit-gpt
cd lit-gpt
Install the minimal dependencies:
pip install -r requirements.txt
Install with all dependencies (including quantization, sentencepiece, tokenizers for Llama models, etc.):
pip install -r requirements-all.txt
(Optional) Use Flash Attention 2
Flash Attention 2 will be used automatically if PyTorch 2.2 (or higher) is installed. Currently, that requires installing PyTorch nightly, which you can get by running:
pip uninstall -y torch torchvision torchaudio torchtext
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
You are all set! 🎉
To generate text predictions, you need to download the model weights. If you don't have them, check out our guide.
Run inference:
python generate/base.py --prompt "Hello, my name is"
This will run the 3B pretrained model and require ~7 GB of GPU memory using the bfloat16
datatype.
Full guide for generating samples from the model.
You can also chat with the model interactively:
python chat/base.py
We support 4-bit quantization (as in QLoRA), (bnb.nf4, bnb.nf4-dq, bnb.fp4, bnb.fp4-dq, gptq.int4) and 8-bit quantization (bnb.int8) for inference by following this guide.
We provide a simple training scripts (finetune/adapter.py
, finetune/adapter_v2.py
, and finetune/lora.py
) that instruction-tunes a pretrained model on the Alpaca dataset.
- Download the data and generate an instruction tuning dataset:
python scripts/prepare_alpaca.py
- Run the finetuning script
For example, you can either use
Adapter (Zhang et al. 2023):
python finetune/adapter.py
or Adapter v2 (Gao et al. 2023):
python finetune/adapter_v2.py
or LoRA (Hu et al. 2021):
python finetune/lora.py
(Please see the tutorials/finetune_adapter for details on the differences between the two adapter methods.)
The finetuning requires at least one GPU with ~12 GB memory (RTX 3060).
It is expected that you have downloaded the pretrained weights as described above. More details about each finetuning method and how you can apply it to your own data can be found in our technical how-to guides.
These technical tutorials illustrate how to run the finetuning code.
Looking for conceptual tutorials and explanations? We have some additional articles below:
We provide simple training scripts based on Fabric if you want to venture into pretraining. Conversion scripts for our optimized streaming PackedDataset
are included.
Follow this guide to start pretraining on
Lit-GPT includes a variety of dataset preparation scripts for finetuning and pretraining. Additional information about the datasets and dataset preparation is provided in the Preparing Datasets tutorial.
Lightning AI has partnered with Google to add first-class support for Cloud TPUs in Lightning’s frameworks and Lit-GPT, helping democratize AI for millions of developers and researchers worldwide.
Using TPUs with Lightning is as straightforward as changing one line of code.
We provide scripts fully optimized for TPUs in the XLA directory
We are on a quest towards fully open source AI.
Join us and start contributing, especially on the following areas:
We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.
Unsure about contributing? Check out our How to Contribute to Lit-GPT and Lit-LLaMA guide.
Don't forget to join our Discord!
- @karpathy for nanoGPT
- @EleutherAI for GPT-NeoX and the Evaluation Harness
- @TimDettmers for bitsandbytes
- @IST-DASLab for GPTQ
- @Microsoft for LoRA
- @tridao for Flash Attention 2
Lit-GPT is released under the Apache 2.0 license.