Skip to content

Accurate medical document summarization using GoLLIE and Llama3 for improved recall.

License

Notifications You must be signed in to change notification settings

Neilus03/recsum

Repository files navigation

Towards Improved Recall in Medical Document Summarization: The GoLLIE Approach

Summarization Pipeline

This repository contains the code and resources for the Medical Document Summarizer, which uses the GoLLIE approach to improve recall in medical document summarization. The summarizer extracts key entities and details from medical texts and generates structured summaries using a few-shot learning approach on Llama3.

Table of Contents

  1. Introduction
  2. Features
  3. Installation
  4. Usage
  5. Experiments and Results
  6. Acknowledgements

Introduction

Maintaining the accuracy of extracted information in medical document summarization is crucial due to the potential consequences of errors. This project leverages GoLLIE, a Guideline-following Large Language Model for Information Extraction, to enhance recall by identifying key entities and essential details in medical texts. The extracted entities and details are used to generate structured summaries using a few-shot learning approach on Llama3.

For more details, refer to our paper.

Features

  • Accurate Information Extraction: Uses GoLLIE to extract key entities and essential details from medical texts.
  • Structured Summarization: Generates concise and informative summaries using Llama3.
  • Few-shot Learning: Eliminates the need for extensive retraining of the summarizing LLM.
  • Multilingual Support: Supports extraction of information from Spanish medical reports as well.

Installation

To install and run the Medical Document Summarizer, follow these steps:

  1. Clone the repository
git clone https://github.com/Neilus03/recsum.git
cd recsum
  1. Set up a conda environment
conda create -n Gollie
conda activate Gollie
  1. Install the required packages
pip install -r requirements.txt

Usage

To run the web application for the Medical Document Summarizer, follow these steps:

  1. Get and set Groq API KEY Groq allows us to run Llama3-70b faster, and it's free. Get your API KEY from here and once you have it run the following command, substituting <API KEY> with your actual API KEY:
export GROQ_API_KEY='<API KEY>'
  1. Navigate to the web-app directory
cd web-app/main
  1. Run the application

Start the application by executing the following script:

./run_app.sh
  1. Reproduce Results

To reproduce results you have to run the following file located in the main folder of the web-app.:

sh compute_results.sh

This results will be saved in the route ../web-app/main/results/results_output

To see the individual files used to compute results check the folder ../web-app/main/results where you will find all the .py files necessary to reproduce the results presented in the paper.

  1. Access the web interface Once the server is running, open your web browser and go to:
http://localhost:5000

You will see the interface for the Medical Document Summarizer where you can paste text, upload a file, or use voice input to summarize medical documents:

web-interface

Acknowledgements

This project was developed by Neil De La Fuente, Joan Samper, and Daniel Vidal at the Computer Vision Center and Universitat Autònoma de Barcelona. Special thanks to "HiTZ zentroa" and the creators of GoLLIE: Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

About

Accurate medical document summarization using GoLLIE and Llama3 for improved recall.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published