This repository contains a project to annotate linguistic data (understatements) using a Language Model (LLM), as well as to evaluate the quality of the annotations and compare them to human annotations for Inter-Annotator Agreement. Finally, the project allows you to create a linguistic dataset of the annotated samples and metadata. The final dataset of understatements is also available on Kaggle.
The project uses some of the following libraries:
openai
to interact with the OpenAI API.pydantic
to define a data model for annotations.instructor
to validate the LLM responses according to the model.
The annotation guidelines used in this project are included in the repository.
The code for this project is organized in a tutorial-style interactive Python notebook, named evaluate-annotations.ipynb
.
The notebook shows how you can:
- Annotate data with an LLM, in a structured way.
- Load human-annotated data and normalize their structure.
- Calculate different metrics for Inter-Annotator Agreement.
- Create a JSON dataset with annotated samples and metadata.
To use the notebook, create an .env
file at the root of the repository and add your OpenAI API key:
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
The installation of the required Python packages is managed by the evaluate-annotations.ipynb
notebook. After you add your .env
file, you should be able to run all cells in the notebook sequentially.
Note: running all cells in the notebook will make API calls to OpenAI, meaning you will incur a cost. For this reason, you can find a LLM-annotated dataset, which you can load from file. The code showing you how is in the notebook, under Part 1 — Step 6.
- Create JSON lines datasets with the annotated samples for fine-tuning a LLM.
- Pass the annotation guidelines to the LLM as system message and compare agreement between average human annotators and the LLM.
- Use the LLM to generate new samples and evaluate them with the human annotators.