A flexible and efficient framework for training and fine-tuning Large Language Models (LLMs) using PyTorch and Hugging Face Transformers.
- Support for various pre-trained models from Hugging Face
- Efficient data processing and tokenization
- Checkpoint saving and model versioning
- GPU acceleration support
- Customizable training parameters
- Progress tracking with tqdm
xl/
├── model.py # Model architecture definition
├── dataset.py # Data loading and processing
├── train.py # Training loop implementation
├── requirements.txt # Project dependencies
- Python 3.8+
- PyTorch 2.1.0+
- Transformers 4.35.0+
- CUDA-capable GPU (recommended)
- Clone the repository:
git clone [your-repository-url]
cd xl
- Install dependencies:
pip install -r requirements.txt
-
Prepare your training data as a list of text strings.
-
Run the training script:
from train import train
texts = [
"Your training text 1",
"Your training text 2",
# ... more training texts
]
model = train(
texts,
model_name="gpt2", # or any other model from Hugging Face
num_epochs=3,
batch_size=8
)
The training function accepts several parameters:
model_name
: Name of the pre-trained model (default: "gpt2")output_dir
: Directory for saving checkpoints (default: "checkpoints")num_epochs
: Number of training epochs (default: 3)batch_size
: Batch size for training (default: 8)learning_rate
: Learning rate (default: 5e-5)max_length
: Maximum sequence length (default: 512)device
: Training device ("cuda" or "cpu")
You can customize the training process by:
- Modifying the model architecture in
model.py
- Adjusting data processing in
dataset.py
- Changing training parameters in
train.py
During training, the model saves checkpoints at:
- Each epoch:
checkpoints/checkpoint-epoch-{epoch_number}
- Final model:
checkpoints/final-model
- Use GPU acceleration when possible
- Adjust batch size based on available memory
- Monitor training loss for optimal learning rate
- Use gradient accumulation for larger effective batch sizes
Contributions are welcome! Please feel free to submit pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face Transformers library
- PyTorch team
- Open source AI community
[Your Contact Information]