The code in this repository demonstrates that Deflecting Adversarial Attacks with Pixel Deflection (Prakash et al. 2018) is ineffective in the white-box threat model.
With an L-infinity perturbation of 4/255, we generate targeted adversarial examples with 97% success rate, and can reduce classifier accuracy to 0%.
See our note for more context and details.
Obligatory picture of sample of adversarial examples against this defense.
@unpublished{cvpr2018breaks,
author = {},
title = {},
year = {2018},
url = {https://arxiv.org/abs/TODO},
}