This is the repository for the paper:
Michael A. Alcorn and Anh Nguyen. The DEformer: An Order-Agnostic Distribution Estimating Transformer. ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+). 2021.
Samples generated by the DEformer. Each sample was generated using a random pixel order. |
If you use this code for your own research, please cite:
@article{alcorn2021deformer,
title={The DEformer: An Order-Agnostic Distribution Estimating Transformer},
author={Alcorn, Michael A. and Nguyen, Anh},
journal={ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+)},
year={2021}
}
After you've cloned the repository to your desired location, create a file called .deformer_profile
in your home directory:
nano ~/.deformer_profile
and copy and paste in the contents of .deformer_profile
, replacing each of the variable values with paths relevant to your environment.
Next, add the following line to the end of your ~/.bashrc
:
source ~/.deformer_profile
and either log out and log back in again or run:
source ~/.bashrc
You should now be able to copy and paste all of the commands in the various instructions sections. For example:
echo ${DEFORMER_PROJECT_DIR}
should print the path you set for DEFORMER_PROJECT_DIR
in .deformer_profile
.
Run (or copy and paste) the following script, editing the variables as appropriate.
#!/usr/bin/env bash
JOB=$(date +%Y%m%d%H%M%S)
echo "train:" >> ${JOB}.yaml
echo " dataset: mnist" >> ${JOB}.yaml # "mnist" or "cifar10".
echo " train_prop: 0.98" >> ${JOB}.yaml
echo " workers: 10" >> ${JOB}.yaml
echo " learning_rate: 1.0e-5" >> ${JOB}.yaml
echo " patience: 5" >> ${JOB}.yaml
echo "model:" >> ${JOB}.yaml
echo " mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo " nhead: 8" >> ${JOB}.yaml
echo " dim_feedforward: 2048" >> ${JOB}.yaml
echo " num_layers: 6" >> ${JOB}.yaml
echo " dropout: 0.0" >> ${JOB}.yaml
# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/
gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &
Run (or copy and paste) the following script, editing the variables as appropriate.
#!/usr/bin/env bash
JOB=$(date +%Y%m%d%H%M%S)
echo "train:" >> ${JOB}.yaml
echo " dataset: power" >> ${JOB}.yaml # "gas" or "power".
echo " batch_size: 128" >> ${JOB}.yaml
echo " workers: 10" >> ${JOB}.yaml
echo " learning_rate: 1.0e-5" >> ${JOB}.yaml
echo " patience: 20" >> ${JOB}.yaml
echo "model:" >> ${JOB}.yaml
echo " idx_embed_dim: 20" >> ${JOB}.yaml
echo " mix_comps: 150" >> ${JOB}.yaml
echo " mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo " nhead: 8" >> ${JOB}.yaml
echo " dim_feedforward: 2048" >> ${JOB}.yaml
echo " num_layers: 6" >> ${JOB}.yaml
echo " dropout: 0.2" >> ${JOB}.yaml
# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/
gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_tabular.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &
Run (or copy and paste) the following script, editing the variables as appropriate. This script trains an order-agnostic DEformer similar to the order-agnostic Transformer described in Appendix D of "Autoregressive Diffusion Models". The only difference between this model and the original DEformer is that each input in the sequence consists of the concatenation of the column embedding for the value being predicted with the column embedding and value for the previous column in the shuffled sequence, i.e., the length of the input sequence is no longer double the number of columns. This model achieves a negative log-likelihood of -0.62 (compared to -0.68 for the original DEformer).
#!/usr/bin/env bash
JOB=$(date +%Y%m%d%H%M%S)
echo "train:" >> ${JOB}.yaml
echo " dataset: power" >> ${JOB}.yaml # "gas" or "power".
echo " batch_size: 128" >> ${JOB}.yaml
echo " workers: 10" >> ${JOB}.yaml
echo " learning_rate: 1.0e-5" >> ${JOB}.yaml
echo " patience: 20" >> ${JOB}.yaml
echo "model:" >> ${JOB}.yaml
echo " idx_embed_dim: 20" >> ${JOB}.yaml
echo " mix_comps: 150" >> ${JOB}.yaml
echo " mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo " nhead: 8" >> ${JOB}.yaml
echo " dim_feedforward: 2048" >> ${JOB}.yaml
echo " num_layers: 6" >> ${JOB}.yaml
echo " dropout: 0.2" >> ${JOB}.yaml
# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/
gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_tabular_ardm.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &
Run (or copy and paste) the following script. This script trains a DEformer-like model (hereafter "DEformer-CSDI") on the imputation task described in "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"; specifically, using a variation of the 10% missing healthcare dataset described in the paper. While the test set is identical to the one in CSDI (because I used the paper's code), I changed the training/validation split to 95%/5% and I used an online strategy to generate missing values for each training sample. Specifically, every time a training sample was encountered, I randomly selected 10% of the observed values to serve as the missing values.
Like the DEformer, the input for DEformer-CSDI consists of a mix of identity feature vectors and identity/value feature vectors. The difference in this case is that DEformer-CSDI is not learning the joint distribution, so only the identity feature vectors are included for the missing values and the attention mask is now full instead of lower triangular (i.e., every input can attend to every other input). Identity was encoded as f(t, k) = [t, embed(k)] where t and k are the time and feature indices, respectively, for a data point. One interesting difference between DEformer-CSDI and CSDI is that DEformer-CSDI simply ignores missing values that are not being predicted, while CSDI "fills in" missing values with zeros to fix the size of the input.
With no hyperparameter tuning, DEformer-CSDI achieves a mean absolute error of 0.216 on the 10% missing healthcare dataset compared to 0.217 for CSDI (see Table 3 in the paper). Notably, DEformer-CSDI vastly outperforms the flattened Transformer baseline discussed in Appendix F, which achieved a mean absolute error of 0.383 (see Table 7).
#!/usr/bin/env bash
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_csdi.py > csdi.log &