A benchmark/dataset for few-shot evaluation of foundation models for electronic health records (EHRs). You can read the paper here.
Whereas most prior EHR benchmarks are limited to the ICU setting, EHRSHOT contains the full longitudinal health records of 6,739 patients from Stanford Medicine and a diverse set of 15 classification tasks tailored towards few-shot evaluation of pre-trained models.
Note
EHRSHOT can now be downloaded as a MEDS compatible dataset. Please visit this Redivis link and download the file called EHRSHOT_MEDS.zip
Use the following steps to run the EHRSHOT benchmark.
1): Install EHRSHOT
conda create -n EHRSHOT_ENV python=3.10 -y
conda activate EHRSHOT_ENV
git clone https://github.com/som-shahlab/ehrshot-benchmark.git
cd ehrshot-benchmark
pip install -r requirements.txt
2): Install FEMR
For our data preprocessing pipeline we use FEMR (Framework for Electronic Medical Records), a Python package for building deep learning models with EHR data.
You must also have CUDA/cuDNN installed (we recommend CUDA 11.8 and cuDNN 8.7.0).
Note that this currently only works on Linux machines.
pip install femr-cuda==0.0.20 dm-haiku==0.0.9 optax==0.1.4
pip install --upgrade "jax[cuda]==0.4.8" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
3): Download dataset + model from Redivis here and place the results in a directory called EHRSHOT_ASSETS/
.
4): Run the benchmark end-to-end with:
bash run_all.sh
Your final folder structure should look like this:
ehrshot-benchmark/
EHRSHOT_ASSETS/
athena_download/
- We do NOT provide this asset. You will have to follow the instructions in the section "Downloading the Athena Ontology" below. However, you can skip this entirely by using the FEMR extract included in our Redivis download.
benchmark/
- We provide this asset from Redivis, which contains labels + few-shot samples for all our tasks.
data/
- We provide this asset from Redivis, which contains a CSV containing the entire dataset.
features/
- We provide this asset from Redivis, which contains preprocessed count + CLMBR-based featurizations.
femr/
- We provide this asset from Redivis, which contains deidentified EHR data as a FEMR extract.
figures/
- We provide this asset from Redivis, which contains figures summarizing the expected results of running our benchmark.
models/
- We provide this asset from Redivis, which contains the weights of our pretrained foundation model (CLMBR).
results/
- We provide this asset from Redivis, which contains raw results from our running of our benchmark on the baseline models.
splits/
- We provide this asset from Redivis, which determine which patient corresponds to which split.
ehrshot/
- We provide the scripts to run the benchmark here
Access: The model is on HuggingFace here and requires signing a research usage agreement.
We publish the model weights of a 141 million parameter clinical foundation model pre-trained on the deidentified structured EHR data of 2.57M patients from Stanford Medicine.
We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR.
We use Clinical Language-Model-Based Representations (CLMBR) as our model. CLMBR is an autoregressive model designed to predict the next medical code in a patient's timeline given previous codes. CLMBR employs causally masked local attention, ensuring forward-only flow of information which is vital for prediction tasks and is in contrast to BERT-based models which are bidirectional in nature. We utilize a transformer as our base model with 141 million trainable parameters and a next code prediction objective, providing minute-level EHR resolution rather than the day-level aggregation of the original model formulation.
Access: We provide two versions of EHRSHOT (each version contains identical data, just different formats). Access requires signing a research usage agreement.
- Original: Link. The original EHRSHOT dataset from the paper (and version compatible with this repo) is stored in
EHRSHOT_ASSETS.zip
at this link. - MEDS: Link. The EHRSHOT dataset in a MEDS compatible version is stored in
EHRSHOT_MEDS.zip
at this link.
EHRSHOT contains:
- 6,739 patients
- 41.6 million clinical events
- 921,499 visits
- 15 prediction tasks
Each patient consists of an ordered timeline of clinical events taken from the structured data of their EHR (e.g. diagnoses, procedures, prescriptions, etc.).
Each task is a predictive classification task, and includes a canonical train/val/test split. The tasks are defined as follows:
Task | Type | Prediction Time | Time Horizon | Possible Label Values in Dataset |
---|---|---|---|---|
Long Length of Stay | Binary | 11:59pm on day of admission | Admission duration | {0,1} aka {<7 days, >=7 days} |
30-day Readmission | Binary | 11:59pm on day of discharge | 30-days post discharge | {0,1} aka {no readmission, readmission} |
ICU Transfer | Binary | 11:59pm on day of admission | Admission duration | {0,1} aka {no transfer, transfer} |
Thrombocytopenia | 4-way Multiclass | Immediately before result is recorded | Next result | {0,1,2,3} aka {low, medium, high, abnormal} |
Hyperkalemia | 4-way Multiclass | Immediately before result is recorded | Next result | {0,1,2,3} aka {low, medium, high, abnormal} |
Hypoglycemia | 4-way Multiclass | Immediately before result is recorded | Next result | {0,1,2,3} aka {low, medium, high, abnormal} |
Hyponatremia | 4-way Multiclass | Immediately before result is recorded | Next result | {0,1,2,3} aka {low, medium, high, abnormal} |
Anemia | 4-way Multiclass | Immediately before result is recorded | Next result | {0,1,2,3} aka {low, medium, high, abnormal} |
Hypertension | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Hyperlipidemia | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Pancreatic Cancer | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Celiac | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Lupus | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Acute MI | Binary | 11:59pm on day of discharge | 1 year post-discharge | {0,1} aka {no diagnosis, diagnosis} |
Chest X-Ray Findings | 14-way Multilabel | 24hrs before report is recorded | Next report | {0,1,...,8192} aka binary string where a 1 at location idx means that the label at CHEXPERT_LABELS[idx] is True, per this array |
Most prior benchmarks are (1) limited to the ICU setting and (2) not tailored towards few-shot evaluation of pre-trained models.
In contrast, EHRSHOT contains (1) the full breadth of longitudinal data that a health system would expect to have on the patients it treats and (2) a broad range of tasks designed to evaluate models' task adaptation and few-shot capabilities:
Benchmark | Source | EHR Properties | Evaluation | Reproducibility | |||||
---|---|---|---|---|---|---|---|---|---|
Dataset | ICU/ED Visits | Non-ICU/ED Visits | # of Patients | # of Tasks | Few Shot | Dataset via DUA | Preprocessing Code | Model Weights | |
EHRSHOT | Stanford Medicine | โ | โ | 7k | 15 | โ | โ | โ | โ |
MIMIC-Extract | MIMIC-III | โ | -- | 34k | 5 | -- | โ | โ | -- |
Purushotham 2018 | MIMIC-III | โ | -- | 35k | 3 | -- | โ | โ | -- |
Harutyunyan 2019 | MIMIC-III | โ | -- | 33k | 4 | -- | โ | โ | -- |
Gupta 2022 | MIMIC-IV | โ | * | 257k | 4 | -- | โ | โ | -- |
COP-E-CAT | MIMIC-IV | โ | * | 257k | 4 | -- | โ | โ | -- |
Xie 2022 | MIMIC-IV | โ | * | 216k | 3 | -- | โ | โ | -- |
eICU | eICU | โ | -- | 73k | 4 | -- | โ | โ | -- |
EHR PT | MIMIC-III / eICU | โ | -- | 86k | 11 | โ | โ | โ | -- |
FIDDLE | MIMIC-III / eICU | โ | -- | 157k | 3 | -- | โ | โ | -- |
HiRID-ICU | HiRID | โ | -- | 33k | 6 | -- | โ | โ | -- |
Solares 2020 | CPRD | โ | โ | 4M | 2 | -- | -- | -- | -- |
The FEMR extract provided in the Redivis download contains all the necessary concepts, so you can ignore this so long as you skip running the bash script 1_create_femr_database.sh
.
If you want to recreate the FEMR extract from scratch, however, then you'll need to download the Athena ontology yourself:
- Go to the Athena website at this link. You may need to create an account.
- Click the green "Download" button at the top right of the website
- Click the purple "Download Vocabularies" button below the green "Download" button
- Name the bundle "athena_download" and select 5.x version
- Scroll to the bottom of the list, and click the blue "Download" button
- It will take some time for the download to be ready. Please refresh the webpage here to check whether your download is ready. Once the download is ready, click "Download"
- After the download is complete, unzip the file and move all the files into the
EHRSHOT_ASSETS/athena_download/
folder in your repo.
After downloading the Athena OHDSI Ontology, you will have to separately download the CPT subset of the ontology. You can follow the instructions in the readme.txt
in your Athena download, or follow the steps below:
- Create a UMLS account here
- Get your UMLS API key here
- From the
EHRSHOT_ASSETS/athena_download/
folder, run this command:bash cpt.sh <YOUR UMLS API KEY>
Your ontology will then be ready to go!
If you find this project helpful, please cite our paper:
@article{wornow2023ehrshot,
title={EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models},
author={Michael Wornow and Rahul Thapa and Ethan Steinberg and Jason Fries and Nigam Shah},
year={2023},
eprint={2307.02028},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
The source code of this repo is released under the Apache License 2.0. The model license and dataset license are listed on their corresponding webpages.