Skip to content
/ elqm Public

Energy-Law Query-Master: A highly modular end-to-end RAG-based question answering system for legal documents from EUR-Lex.

License

Notifications You must be signed in to change notification settings

psaegert/elqm

Repository files navigation

Illustration icon: A modern light bulb design, with its filament shaped as a balance scale representing law. Encapsulating the bulb is a speech bubble, with a question mark and an answer tick, symbolizing the Q&A aspect.

ELQM: Energy-Law Query-Master

Natural Language Processing with Transformers

pytest quality checks

Introduction

We develop ELQM, a RAG-based question answering system for Eurpopean energy law acquired from EUR-Lex. ELQM comprises a full end-to-end pipeline, including data scraping, preprocessing, splitting, vectorization, storage, retrieval, and answer generation with chat-based LLMs. Our work also focuses on usability, providing three access points and linking to source documents for transparency.

Requirements

Hardware

  • 16 GB RAM
  • 12 GB VRAM
    • by default CUDA is used
  • 25 GB storage space
    • 6 GB cache for all configurations
    • 7 GB environment
    • ~5 GB for each llama2 and mistral model

Software

Getting Started

1. Clone the repository

git clone https://github.com/psaegert/elqm-INLPT-WS2023
cd elqm-INLPT-WS2023

2. Install the package

Optional: Create a virtual environment:

conda:

conda create -n elqm python=3.10 [ipykernel]
conda activate elqm

Optional: Install ipykernel to use the environment in Jupyter Notebook

venv:

python3 -m venv elqmVenv
source elqmVenv/bin/activate

Then, install the package via

pip install --upgrade pip
pip install -e .

3. Scrape the data

Scrape the EUR-Lex data with

elqm scrape-data

Alternatively, you can download the scraped data from our Huggingface dataset and move its contents into /data

4. Install Ollama models

  1. Run the Ollama backend via
ollama serve
  1. Pull the desired Ollama model, e.g. mistral
ollama pull mistral

To generate the oracle dataset, we use the llama2 model:

ollama pull llama2

Usage

Gradio Frontend

elqm gui -c configs/prompts/256_5_5_nlc_bge_fn_mistral_h2.yaml

CLI

elqm run -c configs/prompts/256_5_5_nlc_bge_fn_mistral_h2.yaml

Python API

from dynaconf import Dynaconf
import os

from elqm import ELQMPipeline
from elqm.utils import get_dir

config = Dynaconf(settings_files=os.path.join(get_dir("configs", "prompts"), "256_5_5_nlc_bge_fn_mistral_h2.yaml"))
elqm = ELQMPipeline(config)

print(elqm.answer("Which CIE LUV does a model supporting greater than 99 % of the sRGB colour space translate to?"))

Development

Setup

To set up the development environment, run the following commands:

pip install -e .[dev]
pre-commit install

Tests

To run the tests locally, run the following commands:

ollama serve
pytest tests --cov src

Citation

If you use ELQM: Energy-Law Query-Master for your research, please cite it using the following

@software{elqm_2024,
    author = {Daniel Knorr and Paul Saegert and Nikita Tatsch},
    title = {ELQM: Energy-Law Query-Master},
    month = mar,
    year = 2024,
    publisher = {GitHub},
    version = {1.0.0},
    url = {https://github.com/psaegert/elqm}
}

About

Energy-Law Query-Master: A highly modular end-to-end RAG-based question answering system for legal documents from EUR-Lex.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages