Abstract:Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks. Recently, methods utilizing diffusion probabilistic models have achieved great success for adversarial purification in image classification tasks. However, such methods fall into the dilemma of balancing the needs for noise removal and information preservation. This paper points out that existing adversarial purification methods based on diffusion models gradually lose sample information during the core denoising process, causing occasional label shift in subsequent classification tasks. As a remedy, we suggest to suppress such information loss by introducing guidance from the classifier confidence. Specifically, we propose Classifier-cOnfidence gUided Purification (COUP) algorithm, which purifies adversarial examples while keeping away from the classifier decision boundary. Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to preserve the sample information while removing noise during the process of adversarial sample purification, in order to avoid label shift. Specifically, the existing adversarial purification methods based on diffusion models gradually lose sample information during the core denoising process, resulting in occasional label shift in subsequent classification tasks. To solve this problem, the authors propose a new method - Classifier - cOnfidence gUided Purification (COUP). By introducing classifier confidence as guidance, the adversarial samples are kept away from the decision boundary during the purification process, thereby better preserving the prediction information and improving adversarial robustness. ### Background of the Paper - **Adversarial Attacks and Defenses**: Neural networks are vulnerable to adversarial samples, which are generated by adding small perturbations to the original data and can mislead the model to make wrong predictions. Adversarial defense methods can be roughly divided into two categories: model training enhancement and input data pre - processing. - **Adversarial Purification Based on Diffusion Models**: In recent years, methods of using diffusion probability models for adversarial purification have achieved remarkable success in image classification tasks. These methods pre - process the input image using an auxiliary diffusion model before it enters the downstream classifier to remove adversarial noise. However, there is a difficult balance between denoising and information preservation in this method. ### Contributions of the Paper - **Proposing the COUP Algorithm**: The COUP algorithm uses the classifier's confidence in the current class label to guide the purification process of the diffusion model, ensuring that the key information of the sample is preserved while removing the adversarial noise. - **Theoretical and Empirical Analysis**: The paper provides a theoretical analysis, proving that maintaining high classifier confidence helps to avoid label shift, and verifies the effectiveness of the COUP algorithm through experiments. - **Experimental Results**: The experimental results show that COUP has higher adversarial robustness against strong attack methods (such as AutoAttack) on the CIFAR - 10 and CIFAR - 100 datasets. ### Method Overview - **Diffusion Model**: COUP is based on the diffusion model of the score function and realizes denoising through the reverse - time SDE (Stochastic Differential Equation). - **Classifier Confidence Guidance**: The classifier confidence is introduced as a regularization term in the reverse - time SDE, so that the purification process not only maximizes the likelihood of the sample, but also maximizes the classifier's confidence, thereby avoiding getting close to the decision boundary. - **Algorithm Steps**: 1. Set the drift function and diffusion coefficient. 2. Use the SDE process to purify the initial adversarial sample \( x_{\text{adv}} \) to \( x_{\text{ben}} \). 3. Use the trained classifier to predict the label of the purified sample. ### Experimental Setup and Results - **Datasets and Models**: The experiments are carried out on the CIFAR - 10 and CIFAR - 100 datasets, using WideResNet - 28 - 10 and WideResNet - 70 - 16 as classifiers. - **Baseline Methods**: Compare with a variety of advanced adversarial defense methods, including robust optimization methods based on discriminant models and adversarial purification methods based on generative models. - **Evaluation Methods**: Use AutoAttack (including white - box and black - box attacks) and BPDA + EOT evaluation methods to test the robustness under different threat models. ### Conclusion The COUP algorithm effectively solves the problems of information loss and label shift in the adversarial purification process by introducing classifier confidence guidance, and improves the model's adversarial robustness. The experimental results show that COUP performs excellently under a variety of strong attack methods and is superior to the existing adversarial purification methods.

Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information

Guided Diffusion Model for Adversarial Purification

Guided Diffusion-based Adversarial Purification Model with Denoised Prior Constraint

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Robust Diffusion Models for Adversarial Purification

Instant Adversarial Purification with Adversarial Consistency Distillation

Struggle with Adversarial Defense? Try Diffusion

Adversarial Purification of Information Masking

NCIS: Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations

Robust Classification via a Single Diffusion Model

AID-Purifier: A Light Auxiliary Network for Boosting Adversarial Defense

Purifier: Defending Data Inference Attacks via Transforming Confidence Scores

Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion Models

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge

Random Sampling for Diffusion-based Adversarial Purification

Your Diffusion Model is Secretly a Certifiably Robust Classifier

Diffusion-based Adversarial Purification for Intrusion Detection

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds