Abstract:Diffusion models have been applied to improve adversarial robustness of image classifiers by purifying the adversarial noises or generating realistic data for adversarial training. However, diffusion-based purification can be evaded by stronger adaptive attacks while adversarial training does not perform well under unseen threats, exhibiting inevitable limitations of these methods. To better harness the expressive power of diffusion models, this paper proposes Robust Diffusion Classifier (RDC), a generative classifier that is constructed from a pre-trained diffusion model to be adversarially robust. RDC first maximizes the data likelihood of a given input and then predicts the class probabilities of the optimized input using the conditional likelihood estimated by the diffusion model through Bayes' theorem. To further reduce the computational cost, we propose a new diffusion backbone called multi-head diffusion and develop efficient sampling strategies. As RDC does not require training on particular adversarial attacks, we demonstrate that it is more generalizable to defend against multiple unseen threats. In particular, RDC achieves $75.67\%$ robust accuracy against various $\ell_\infty$ norm-bounded adaptive attacks with $\epsilon_\infty=8/255$ on CIFAR-10, surpassing the previous state-of-the-art adversarial training models by $+4.77\%$. The results highlight the potential of generative classifiers by employing pre-trained diffusion models for adversarial robustness compared with the commonly studied discriminative classifiers. Code is available at \url{

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the robustness of image classifiers against adversarial examples in deep learning. Specifically, existing methods such as the purification method based on diffusion models and the adversarial training method show limitations when facing stronger adaptive attacks or unseen threats. To overcome these limitations, the paper proposes a new generative classifier - the Robust Diffusion Classifier (RDC), which is constructed through a pre - trained diffusion model and aims to improve adversarial robustness. ### Background and Problems of the Paper **1. The Problem of Adversarial Examples** Adversarial examples refer to malicious samples generated by adding imperceptible small perturbations to natural samples by humans, and these samples can cause deep - learning models to make wrong predictions. The existence of adversarial examples poses a threat to the security in practical applications, such as in the fields of face recognition, autonomous driving, and healthcare. **2. Existing Methods and Their Limitations** - **Adversarial Training**: The robustness of the model is improved by training the neural network with adversarially augmented data. However, this method is usually only effective against specific attacks and has poor generalization ability against unseen threats. - **Purification Method Based on Diffusion Models**: The adversarial samples are purified through the forward and reverse processes of the diffusion model and then input into the classifier. But this method is vulnerable to more powerful adaptive attacks. **3. Advantages of Diffusion Models** Diffusion models are powerful generative models. Gaussian noise is gradually added to the data through the forward diffusion process, and then the noise is learned to be removed through the reverse generation process. Some studies have attempted to use diffusion models to improve adversarial robustness, but existing methods have limitations, such as high randomness and being easily attacked. ### Solutions Proposed in the Paper **1. Robust Diffusion Classifier (RDC)** - **Generative Classifier**: RDC is a generative classifier constructed through a pre - trained diffusion model. It first maximizes the likelihood of the input data, and then predicts the class probability of the optimized input using the conditional likelihood estimated by the diffusion model through Bayes' theorem. - **Multi - Head Diffusion Model**: To reduce the computational cost, the paper proposes a new diffusion backbone network - the multi - head diffusion model, which modifies the last convolutional layer of UNet to predict the noise of all classes simultaneously. **2. Likelihood Maximization** - **Pre - optimization Step**: To further improve robustness, the paper proposes likelihood maximization as a pre - optimization step, moving the input data to a high - likelihood region before inputting it into the diffusion classifier. This step helps to reduce the problems of inaccurate density estimation in some regions of the diffusion model or a large gap between the likelihood and the diffusion loss. ### Experimental Results - **Robustness Evaluation**: RDC shows excellent robustness against multiple adaptive attacks under ℓ∞ - norm and ℓ2 - norm on the CIFAR - 10 dataset, achieving a robust accuracy rate of 75.67%, which is 4.77% higher than the existing state - of - the - art adversarial training method. - **Generalization Ability**: RDC shows a significant improvement when facing unseen threat models, especially under the StAdv attack, with a performance improvement of more than 30%. ### Conclusion The paper solves the limitations of existing methods in adversarial robustness by proposing the Robust Diffusion Classifier (RDC), demonstrating the potential of generative models in improving adversarial robustness. RDC not only performs well under known threats but also has good generalization ability and can maintain high robustness under unseen threat models.

Robust Classification via a Single Diffusion Model

Diffusion Models are Certifiably Robust Classifiers

Your Diffusion Model is Secretly a Certifiably Robust Classifier

Struggle with Adversarial Defense? Try Diffusion

Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Robust Diffusion Models for Adversarial Purification

Raising the Bar for Certified Adversarial Robustness with Diffusion Models

DensePure: Understanding Diffusion Models for Adversarial Robustness

RCDM: Enabling Robustness for Conditional Diffusion Model

Adversarial Robustification via Text-to-Image Diffusion Models

Better Diffusion Models Further Improve Adversarial Training

DiffuseDef: Improved Robustness to Adversarial Attacks

Improving Adversarial Robustness by Contrastive Guided Diffusion Process

Adv-BDPM: Adversarial Attack Based on Boundary Diffusion Probability Model.

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Toward Transferable Attack via Adversarial Diffusion in Face Recognition

DiffDefense: Defending against Adversarial Attacks via Diffusion Models

DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing

ROIC-DM: Robust Text Inference and Classification via Diffusion Model