Skip to content

Commit

Permalink
update README + BERGEN pic
Browse files Browse the repository at this point in the history
  • Loading branch information
Stephane Clinchant authored and Stephane Clinchant committed Oct 24, 2024
1 parent 39cfa0e commit d261ebe
Show file tree
Hide file tree
Showing 10 changed files with 10 additions and 11 deletions.
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,22 @@
<img src="https://upload.wikimedia.org/wikipedia/commons/6/62/Night_view_from_Mount_Floyen_-_Bergen%2C_Norway.jpg" width="500">
<img src="documentation/images/BERGEN.png" width="500">

# BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

[![arXiv](https://img.shields.io/badge/arXiv-2407.01102-b31b1b.svg)](https://arxiv.org/abs/2407.01102)
[![arXiv](https://img.shields.io/badge/arXiv-2407.01463-b31b1b.svg)](https://arxiv.org/abs/2407.01463)
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

BERGEN (BEnchmarking Retrieval-augmented GENeration) is a library designed to benchmark RAG systems, with a focus on question-answering (QA). It addresses the challenge of inconsistent benchmarking in comparing approaches and understanding the impact of each component in a RAG pipeline.
BERGEN (BEnchmarking Retrieval-augmented GENeration) is a library designed to benchmark RAG systems with a focus on question-answering (QA). It addresses the challenge of inconsistent benchmarking in comparing approaches and understanding the impact of each component in a RAG pipeline.

## Key Features

- Easy reproducibility and integration of new datasets and models
- Support for various retrievers (20+), rerankers(4) , and large language models (20+)
- Support for various retrievers (20+), rerankers(4) and large language models (20+)
- Flexible configuration system using YAML files
- Comprehensive evaluation metrics (*Match, EM, LLMEval*, ... )
- Support for multilingual experiments

![](documentations/images/teaser_bergen.jpg)

![](documentation/images/teaser_bergen.jpg)

For more information and experimental findings, please see:
- The initial BERGEN paper: https://arxiv.org/abs/2407.01102
Expand All @@ -37,7 +36,7 @@ python3 bergen.py retriever="bm25" reranker="minilm6" generator='tinyllama-chat'

## Installation

Check the [installation guide](documentations/INSTALL.md) for detailed instructions.
Check the [installation guide](documentation/INSTALL.md) for detailed instructions.


## Usage
Expand All @@ -55,11 +54,11 @@ done
```


To fully configure BERGEN, please read our [configuration guide](documentations/config.md)
To fully configure BERGEN, please read our [configuration guide](documentation/config.md)

## Evaluation

Run the evaluation script to calculate LLMEval metrics and print the results:
Run the evaluation script to calculate LLMEval metrics and print the results:

```bash
python3 eval.py --experiments_folder experiments/ --llm_batch_size 16 --split 'dev' --llm vllm_SOLAR-107B
Expand All @@ -68,7 +67,7 @@ python3 eval.py --experiments_folder experiments/ --llm_batch_size 16 --split 'd
python print_results.py --folder experiments/ --format=tiny
```

For more evaluation options and details, refer to the [Evaluation section](documentations/evaluations.md) in the full documentation.
For more evaluation options and details, refer to the [Evaluation section](documentation/evaluations.md) in the complete documentation.

## RAG Baselines
Bergen provides results for several models and many datasets aiming to **provide strong baselines**. On the important datasets for RAG, the match metric is given by this table (see more in our paper):
Expand All @@ -83,7 +82,7 @@ Solar-10.7B | 76.2 | 70.2 | 92.8 | 71.2 | 53.9|

## Multilingual Experiments

Refer to our [multilingual RAG guide](documentations/multilingual.md) for running experiments with multilingual user queries and/or multilingual Wikipedia as a datastore.
Refer to our [multilingual RAG guide](documentation/multilingual.md) for running experiments with multilingual user queries and/or multilingual Wikipedia as a datastore.


## Training
Expand All @@ -96,7 +95,7 @@ python3 bergen.py retriever="bm25" reranker="minilm6" generator='tinyllama-chat'

## Extensions

For adding new datasets, models, or configuring prompts, see our [reference guide](documentations/extensions.md).
To add new datasets and models, or configure prompts, see our [reference guide](/extensions.md).


## Cite
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added documentation/images/BERGEN.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit d261ebe

Please sign in to comment.