Toward Adversarial Robustness via Semi-supervised Robust Training

Yiming Li,Baoyuan Wu,Yan Feng,Yanbo Fan,Yong Jiang,Zhifeng Li,Shutao Xia
DOI: https://doi.org/10.48550/arXiv.2003.06974
2020-06-16
Abstract:Adversarial examples have been shown to be the severe threat to deep neural networks (DNNs). One of the most effective adversarial defense methods is adversarial training (AT) through minimizing the adversarial risk $R_{adv}$, which encourages both the benign example $x$ and its adversarially perturbed neighborhoods within the $\ell_{p}$-ball to be predicted as the ground-truth label. In this work, we propose a novel defense method, the robust training (RT), by jointly minimizing two separated risks ($R_{stand}$ and $R_{rob}$), which is with respect to the benign example and its neighborhoods respectively. The motivation is to explicitly and jointly enhance the accuracy and the adversarial robustness. We prove that $R_{adv}$ is upper-bounded by $R_{stand} + R_{rob}$, which implies that RT has similar effect as AT. Intuitively, minimizing the standard risk enforces the benign example to be correctly predicted, and the robust risk minimization encourages the predictions of the neighbor examples to be consistent with the prediction of the benign example. Besides, since $R_{rob}$ is independent of the ground-truth label, RT is naturally extended to the semi-supervised mode ($i.e.$, SRT), to further enhance the adversarial robustness. Moreover, we extend the $\ell_{p}$-bounded neighborhood to a general case, which covers different types of perturbations, such as the pixel-wise ($i.e.$, $x + \delta$) or the spatial perturbation ($i.e.$, $ AX + b$). Extensive experiments on benchmark datasets not only verify the superiority of the proposed SRT method to state-of-the-art methods for defensing pixel-wise or spatial perturbations separately, but also demonstrate its robustness to both perturbations simultaneously. The code for reproducing main results is available at \url{<a class="link-external link-https" href="https://github.com/THUYimingLi/Semi-supervised_Robust_Training" rel="external noopener nofollow">this https URL</a>}.
Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the robustness against adversarial examples in deep neural networks (DNNs). Specifically, the paper proposes a new defense method - Robust Training (RT), which enhances the model's accuracy and adversarial robustness by jointly minimizing the standard risk \(R_{\text{stand}}\) and the robust risk \(R_{\text{rob}}\). In addition, the paper also extends this method to the semi - supervised mode (Semi - supervised Robust Training, SRT), using unlabeled data to further improve the model's adversarial robustness. ### Main contributions of the paper: 1. **Proposing the robust training method**: By jointly minimizing the standard risk and the robust risk, this method can be naturally extended to the semi - supervised mode. 2. **Expanding the perturbation neighborhood definition**: The traditional \(\ell_p\) - bounded neighborhood is extended to a more general form to cover different types of perturbations (such as pixel - level and spatial - level perturbations), thus achieving joint robustness against multiple types of perturbations. 3. **Experimental verification**: Extensive experiments on benchmark datasets verify that the proposed SRT method is superior to existing adversarial training methods and is robust to both pixel - level and spatial - level perturbations simultaneously. ### Key technical points of the paper: - **Standard risk \(R_{\text{stand}}\)**: It measures whether the prediction of a benign sample \(x\) is consistent with the true label \(y\). - **Robust risk \(R_{\text{rob}}\)**: It measures whether the sample predictions within \(x\) and its perturbation neighborhood are consistent, without relying on the true label. - **Robust training objective**: Minimize \(R_{\text{stand}}+\lambda\cdot R_{\text{rob}}\), where \(\lambda\) is a balancing parameter. - **Semi - supervised robust training**: Use unlabeled data to further improve the model's robustness. ### Experimental results: - **Spatial adversarial defense**: SRT significantly outperforms existing methods on the CIFAR - 10 and MNIST datasets, especially with a significant performance improvement under adversarial attacks. - **Pixel - level adversarial defense**: SRT significantly improves the clean accuracy and adversarial robustness of the model in the semi - supervised setting. - **Composite adversarial attacks and defenses**: SRT can effectively defend against composite attacks composed of multiple types of perturbations and performs better than single - type defense methods. ### Conclusion: The method proposed in the paper not only achieves significant improvements on the basis of standard adversarial training, but also further enhances the model's robustness by introducing unlabeled data. This provides important ideas for developing a general framework that can resist multiple types of adversarial attacks.