We develop ELQM, a RAG-based question answering system for Eurpopean energy law acquired from EUR-Lex. ELQM comprises a full end-to-end pipeline, including data scraping, preprocessing, splitting, vectorization, storage, retrieval, and answer generation with chat-based LLMs. Our work also focuses on usability, providing three access points and linking to source documents for transparency.
- 16 GB RAM
- 12 GB VRAM
- by default CUDA is used
- 25 GB storage space
- 6 GB cache for all configurations
- 7 GB environment
- ~5 GB for each
llama2
andmistral
model
- Python 3.10
pip
>= 24.0- Ollama
- Ubuntu >= 22.04 (optional, for
GPT4AllEmbeddings
which requires glibc) - For SparkNLP: Java OpenJDK or similar (see https://pypi.org/project/spark-nlp/)
git clone https://github.com/psaegert/elqm-INLPT-WS2023
cd elqm-INLPT-WS2023
Optional: Create a virtual environment:
conda:
conda create -n elqm python=3.10 [ipykernel]
conda activate elqm
Optional: Install ipykernel to use the environment in Jupyter Notebook
venv:
python3 -m venv elqmVenv
source elqmVenv/bin/activate
Then, install the package via
pip install --upgrade pip
pip install -e .
Scrape the EUR-Lex data with
elqm scrape-data
Alternatively, you can download the scraped data from our Huggingface dataset and move its contents into /data
- Run the Ollama backend via
ollama serve
- Pull the desired Ollama model, e.g.
mistral
ollama pull mistral
To generate the oracle dataset, we use the llama2
model:
ollama pull llama2
Gradio Frontend
elqm gui -c configs/prompts/256_5_5_nlc_bge_fn_mistral_h2.yaml
CLI
elqm run -c configs/prompts/256_5_5_nlc_bge_fn_mistral_h2.yaml
Python API
from dynaconf import Dynaconf
import os
from elqm import ELQMPipeline
from elqm.utils import get_dir
config = Dynaconf(settings_files=os.path.join(get_dir("configs", "prompts"), "256_5_5_nlc_bge_fn_mistral_h2.yaml"))
elqm = ELQMPipeline(config)
print(elqm.answer("Which CIE LUV does a model supporting greater than 99 % of the sRGB colour space translate to?"))
To set up the development environment, run the following commands:
pip install -e .[dev]
pre-commit install
To run the tests locally, run the following commands:
ollama serve
pytest tests --cov src
If you use ELQM: Energy-Law Query-Master for your research, please cite it using the following
@software{elqm_2024,
author = {Daniel Knorr and Paul Saegert and Nikita Tatsch},
title = {ELQM: Energy-Law Query-Master},
month = mar,
year = 2024,
publisher = {GitHub},
version = {1.0.0},
url = {https://github.com/psaegert/elqm}
}