GitHub - krisstallenberg/evaluating-annotations: This repository holds code to annotate textual data using LLMs, and calculate different measures of Inter-Annotator Agreement (IAA).

This repository contains a project to annotate linguistic data (understatements) using a Language Model (LLM), as well as to evaluate the quality of the annotations and compare them to human annotations for Inter-Annotator Agreement. Finally, the project allows you to create a linguistic dataset of the annotated samples and metadata. The final dataset of understatements is also available on Kaggle.

The project uses some of the following libraries:

openai to interact with the OpenAI API.
pydantic to define a data model for annotations.
instructor to validate the LLM responses according to the model.

The annotation guidelines used in this project are included in the repository.

Usage

The code for this project is organized in a tutorial-style interactive Python notebook, named evaluate-annotations.ipynb.

The notebook shows how you can:

Annotate data with an LLM, in a structured way.
Load human-annotated data and normalize their structure.
Calculate different metrics for Inter-Annotator Agreement.
Create a JSON dataset with annotated samples and metadata.

Installation

To use the notebook, create an .env file at the root of the repository and add your OpenAI API key:

OPENAI_API_KEY=<YOUR_OPENAI_KEY>

The installation of the required Python packages is managed by the evaluate-annotations.ipynb notebook. After you add your .env file, you should be able to run all cells in the notebook sequentially.

Note: running all cells in the notebook will make API calls to OpenAI, meaning you will incur a cost. For this reason, you can find a LLM-annotated dataset, which you can load from file. The code showing you how is in the notebook, under Part 1 — Step 6.

Suggestions for future work

Create JSON lines datasets with the annotated samples for fine-tuning a LLM.
Pass the annotation guidelines to the LLM as system message and compare agreement between average human annotators and the LLM.
Use the LLM to generate new samples and evaluate them with the human annotators.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
average_confidence_per_genre.pdf		average_confidence_per_genre.pdf
confidence_per_annotator.pdf		confidence_per_annotator.pdf
evaluate-annotations.ipynb		evaluate-annotations.ipynb
find_negations.ipynb		find_negations.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

Installation

Suggestions for future work

About

Releases

Packages

Contributors 2

Languages

License

krisstallenberg/evaluating-annotations

Folders and files

Latest commit

History

Repository files navigation

Usage

Installation

Suggestions for future work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages