Robust Classification via a Single Diffusion Model

Huanran Chen,Yinpeng Dong,Zhengyi Wang,Xiao Yang,Chengqi Duan,Hang Su,Jun Zhu
2024-05-21
Abstract:Diffusion models have been applied to improve adversarial robustness of image classifiers by purifying the adversarial noises or generating realistic data for adversarial training. However, diffusion-based purification can be evaded by stronger adaptive attacks while adversarial training does not perform well under unseen threats, exhibiting inevitable limitations of these methods. To better harness the expressive power of diffusion models, this paper proposes Robust Diffusion Classifier (RDC), a generative classifier that is constructed from a pre-trained diffusion model to be adversarially robust. RDC first maximizes the data likelihood of a given input and then predicts the class probabilities of the optimized input using the conditional likelihood estimated by the diffusion model through Bayes' theorem. To further reduce the computational cost, we propose a new diffusion backbone called multi-head diffusion and develop efficient sampling strategies. As RDC does not require training on particular adversarial attacks, we demonstrate that it is more generalizable to defend against multiple unseen threats. In particular, RDC achieves $75.67\%$ robust accuracy against various $\ell_\infty$ norm-bounded adaptive attacks with $\epsilon_\infty=8/255$ on CIFAR-10, surpassing the previous state-of-the-art adversarial training models by $+4.77\%$. The results highlight the potential of generative classifiers by employing pre-trained diffusion models for adversarial robustness compared with the commonly studied discriminative classifiers. Code is available at \url{
Computer Vision and Pattern Recognition,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the robustness of image classifiers against adversarial examples in deep learning. Specifically, existing methods such as the purification method based on diffusion models and the adversarial training method show limitations when facing stronger adaptive attacks or unseen threats. To overcome these limitations, the paper proposes a new generative classifier - the Robust Diffusion Classifier (RDC), which is constructed through a pre - trained diffusion model and aims to improve adversarial robustness. ### Background and Problems of the Paper **1. The Problem of Adversarial Examples** Adversarial examples refer to malicious samples generated by adding imperceptible small perturbations to natural samples by humans, and these samples can cause deep - learning models to make wrong predictions. The existence of adversarial examples poses a threat to the security in practical applications, such as in the fields of face recognition, autonomous driving, and healthcare. **2. Existing Methods and Their Limitations** - **Adversarial Training**: The robustness of the model is improved by training the neural network with adversarially augmented data. However, this method is usually only effective against specific attacks and has poor generalization ability against unseen threats. - **Purification Method Based on Diffusion Models**: The adversarial samples are purified through the forward and reverse processes of the diffusion model and then input into the classifier. But this method is vulnerable to more powerful adaptive attacks. **3. Advantages of Diffusion Models** Diffusion models are powerful generative models. Gaussian noise is gradually added to the data through the forward diffusion process, and then the noise is learned to be removed through the reverse generation process. Some studies have attempted to use diffusion models to improve adversarial robustness, but existing methods have limitations, such as high randomness and being easily attacked. ### Solutions Proposed in the Paper **1. Robust Diffusion Classifier (RDC)** - **Generative Classifier**: RDC is a generative classifier constructed through a pre - trained diffusion model. It first maximizes the likelihood of the input data, and then predicts the class probability of the optimized input using the conditional likelihood estimated by the diffusion model through Bayes' theorem. - **Multi - Head Diffusion Model**: To reduce the computational cost, the paper proposes a new diffusion backbone network - the multi - head diffusion model, which modifies the last convolutional layer of UNet to predict the noise of all classes simultaneously. **2. Likelihood Maximization** - **Pre - optimization Step**: To further improve robustness, the paper proposes likelihood maximization as a pre - optimization step, moving the input data to a high - likelihood region before inputting it into the diffusion classifier. This step helps to reduce the problems of inaccurate density estimation in some regions of the diffusion model or a large gap between the likelihood and the diffusion loss. ### Experimental Results - **Robustness Evaluation**: RDC shows excellent robustness against multiple adaptive attacks under ℓ∞ - norm and ℓ2 - norm on the CIFAR - 10 dataset, achieving a robust accuracy rate of 75.67%, which is 4.77% higher than the existing state - of - the - art adversarial training method. - **Generalization Ability**: RDC shows a significant improvement when facing unseen threat models, especially under the StAdv attack, with a performance improvement of more than 30%. ### Conclusion The paper solves the limitations of existing methods in adversarial robustness by proposing the Robust Diffusion Classifier (RDC), demonstrating the potential of generative models in improving adversarial robustness. RDC not only performs well under known threats but also has good generalization ability and can maintain high robustness under unseen threat models.