Unsupervised Adversarial Perturbation Eliminating Via Disentangled Representations.

Lingyun Jiang,Kai Qiao,Ruoxi Qin,Jian Chen,Haibing Bu,Bin Yan
DOI: https://doi.org/10.1145/3351917.3351987
2019-01-01
Abstract:Although deep neural networks (DNNs) could achieve state-of-the-art performance while recognizing images, they often vulnerable to adversarial examples where input intended to be added the small magnitude perturbations may mislead them to incorrect results. It is worth researching on defending against adversarial examples due to the potential security threats. In this paper, we propose an unsupervised method for eliminating adversarial perturbation based on disentangled representations. To achieve adversarial defense, we propose extracting the content and perturbation features of adversarial examples by content encoders and perturbation encoders. Meanwhile, to handle the unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations and a perturbation branch. We also add an adversarial loss on recovered images to make DNNs predict right. Qualitative results show that our method can eliminate adversarial perturbation without paired training data. We perform extensive experiments on two public datasets MNIST and CIFAR10, which is shown the efficiency of resisting adversarial examples.
What problem does this paper attempt to address?