# training with default parameters (weight_decay=1e-4 and alpha=1)
python easy_mixup.py --sess my_session_1 --seed 11111
The above bash script will download the CIFAR data and train the network with 1e-4 weight decay and mixup parameter alpha=1.0; alternatively, we can experiment with other weight decay and alpha value using corresponding options:
# training with weight_decay=5e-4 and alpha=0 (no mixup)
python easy_mixup.py --sess my_session_2 --seed 22222 --decay 5e-4 --alpha 0.
The other choices (network architecture, #epochs, learning rate schedule, momentum, data augmentation etc.) are hard coded but modifications are hopefully straightfoward.
By default, the trained model with the best validation accuracy resides in ./checkpoint
folder, and the training log (including training loss/accuracy and validation loss/accuracy for each epoch) is saved in ./results
as a .csv
file.
mixup reduces overfitting and improves generalization. The following plots show test error curves of a typical training session using the PreAct ResNet-18 architecture (default; you can make changes here). Note that compared with the ERM baseline, mixup prefers a smaller weight decay (1e-4 vs. 5e-4), indicating its regularization effects.
Model | weight decay = 1e-4 | weight decay = 5e-4 |
---|---|---|
ERM | 5.53% | 5.18% |
mixup | 4.24% | 4.68% |
- A Tensorflow implementation of mixup which reproduces our results in tensorpack
This reimplementation is adapted from the pytorch-cifar repository by kuangliu.