Unifying Heterogeneous Electronic Health Record Systems via Clinical Text-Based Code Embedding (KDD 2021 Under Review)
This repository provides Pytorch code to implement DescEmb, a code-agnostic EHR predictive model.
main.py : To train a model
datasize_dependent.py : To train a model varying the size of training dataset (corresponds to Sec 4.5 in the paper)
few_shot.py : To transfer the model differing ratios of the target dataset (corresponds to Sec 4.6 in the paper)
divide_and_conquer.py : To test separately trained model and test on pooled (corresponds to Sec 4.7 - Divide & Conquer in the paper)
./preprocessing : code for preprocessing both MIMIC-III and eICU
./visualize_results : code fore visualizing results in the paper
python main.py \
--DescEmb \ # otherwise, CodeEmb
--source_file = 'eicu' \
--target='readmission' \
--item='all' \
--time_window = '12' \
--batch_size = 512 \
--embedding_dim = 128 \
--hidden_dim = 128 \
--n_epochs = 100 \
--input_path = './data_folder/'
--path = './output' \
input
└─ all
├─ eicu_12_all_150_2020.pkl
├─ eicu_12_all_150_2021.pkl
├─ ...
└─ mimic_12_all_150_2029.pkl
output
└─ all
├─ singleRNN
│ ├─ mimic
│ │ ├─ readmission
│ │ ├─ mortality
│ │ ├─ los_3days
│ │ ├─ los_7days
│ │ └─ dx_depth1_unique
│ └─ eicu
│ ├─ readmission
│ ├─ mortality
│ ├─ los_3days
│ ├─ los_7days
│ └─ dx_depth1_unique
│
└─ cls_learnable
├─ mimic
│ ├─ readmission
│ ├─ mortality
│ ├─ los_3days
│ ├─ los_7days
│ └─ dx_depth1_unique
├─ eicu
│ ├─ readmission
│ ├─ mortality
│ ├─ los_3days
│ ├─ los_7days
│ └─ dx_depth1_unique
└─ both
├─ readmission
├─ mortality
├─ los_3days
├─ los_7days
└─ dx_depth1_unique