Skip to content

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

License

Notifications You must be signed in to change notification settings

LeonVouk/lighteval

 
 

Repository files navigation


lighteval library logo

Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Tests Quality Python versions License Version


This is a forked lighteval repository, extended to include Greek tasks and prompts.

Lighteval Documentation: Lighteval's Wiki


Unlock the Power of LLM Evaluation with Lighteval 🚀

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

🔑 Key Features

⚡️ Installation

To pip install the current repo (Greek extension), you can either:

pip install lighteval[accelerate,extended_tasks]@git+https://github.com/LeonVouk/lighteval.git

or, for active development, clone the repository and install it locally:

pip install -e ".[accelerate,extended_tasks]"

Lighteval allows for many extras when installing, see here for a complete list. extended_tasks is only necessary when evaluating OpenAI models.

If you want to push results to the Hugging Face Hub, add your access token as an environment variable:

huggingface-cli login

🚀 Quickstart

Lighteval offers two main entry points for model evaluation:

Here’s a quick command to evaluate using the Accelerate backend:

lighteval accelerate \
    "pretrained=gpt2" \
    "leaderboard|truthfulqa:mc|0|0" \
    --override-batch-size 1 \
    --output-dir="./evals/"

OpenAI and LiteLLM-proxy requests

To evaluate an OpenAI model, e.g., gpt-3.5-turbo, make sure you've added a working OPENAI_API_KEY to your env and run:

lighteval endpoint openai \
      "gpt-3.5-turbo" \
      "community|mmlu_pro_cot_el|0|0" \
      --output-dir="./evals/" \
      --custom-tasks "./community_tasks/greek_evals.py" \
      --save-details \
      --max-samples 10

You can optionally add the --max-samples 10 flag for quick testing. This will limit the run to only 10 benchmark rows.

To evaluate a non-GPT API, e.g., Meltemi:

export OPENAI_API_KEY="<Meltemi-API-key>"

lighteval endpoint openai \
  "meltemi" \
  "community|mmlu_pro_cot_el|0|0" \
  --base-url="http://ec2-3-19-37-251.us-east-2.compute.amazonaws.com:4000" \
  --tokenizer="ilsp/Meltemi-7B-Instruct-v1.5" \
  --max-samples 10 \
  --output-dir="./evals/" \
  --custom-tasks "./community_tasks/greek_evals.py" \
  --use-chat-template \
  --save-details

HF model requests

If you don't have an existing LLM deployment, you can simply provide the HuggingFace id (e.g., ilsp/Meltemi-7B-Instruct-v1.5).

export ID="ilsp/Meltemi-7B-Instruct-v1.5"
export EVAL_OUTPUTS_PATH="/path/to/eval/outputs"

accelerate launch --multi_gpu --num_processes=4 run_evals_accelerate.py \
      --model-args="pretrained=${ID},model_parallel=True" \
      --tasks examples/tasks/extended_eval_greek_tasks.txt \
      --custom-tasks "community_tasks/greek_evals.py" \
      --override-batch-size 1 \
      --output-dir="${EVAL_OUTPUTS_PATH}" \
      --save-details

🙏 Acknowledgements

Lighteval started as an extension of the fantastic Eleuther AI Harness (which powers the Open LLM Leaderboard) and draws inspiration from the amazing HELM framework.

While evolving Lighteval into its own standalone tool, we are grateful to the Harness and HELM teams for their pioneering work on LLM evaluations.

🌟 Contributions Welcome 💙💚💛💜🧡

Got ideas? Found a bug? Want to add a task or metric? Contributions are warmly welcomed!

If you're adding a new feature, please open an issue first.

If you open a PR, don't forget to run the styling!

pip install -e .[dev]
pre-commit install
pre-commit run --all-files

📜 Citation

@misc{lighteval,
  author = {Fourrier, Clémentine and Habib, Nathan and Wolf, Thomas and Tunstall, Lewis},
  title = {LightEval: A lightweight framework for LLM evaluation},
  year = {2023},
  version = {0.5.0},
  url = {https://github.com/huggingface/lighteval}
}

About

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%