Abstract:Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means to improve DNNs robustness by removing these perturbations before feeding the data into the model. However, it faces significant challenges in preserving key structural and semantic information of data, as the imperceptible nature of adversarial perturbations makes it hard to avoid over-correcting, which can destroy important information and degrade model performance. In this paper, we break away from traditional adversarial purification methods by focusing on the clean data manifold. To this end, we reveal that samples generated by a well-trained generative model are close to clean ones but far from adversarial ones. Leveraging this insight, we propose Consistency Model-based Adversarial Purification (CMAP), which optimizes vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. Specifically, 1) we propose a \textit{Perceptual consistency restoration} mechanism by minimizing the discrepancy between generated samples and input samples in both pixel and perceptual spaces. 2) To maintain the optimized latent vectors within the valid data manifold, we introduce a \textit{Latent distribution consistency constraint} strategy to align generated samples with the clean data distribution. 3) We also apply a \textit{Latent vector consistency prediction} scheme via an ensemble approach to enhance prediction reliability. CMAP fundamentally addresses adversarial perturbations at their source, providing a robust purification. Extensive experiments on CIFAR-10 and ImageNet-100 show that our CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.

Instant Adversarial Purification with Adversarial Consistency Distillation

Guided Diffusion Model for Adversarial Purification

Redeem Myself: Purifying Backdoors in Deep Learning Models Using Self Attention Distillation.

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

NCIS: Neural Contextual Iterative Smoothing for Purifying Adversarial Perturbations

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

LightPure: Realtime Adversarial Image Purification for Mobile Devices Using Diffusion Models

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Guided Diffusion-based Adversarial Purification Model with Denoised Prior Constraint

Classifier Guidance Enhances Diffusion-based Adversarial Purification by Preserving Predictive Information

Adversarial Purification of Information Masking

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

Robust Diffusion Models for Adversarial Purification

Language Guided Adversarial Purification

ZeroPur: Succinct Training-Free Adversarial Purification

Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

Randomized Purifier Based on Low Adversarial Transferability for Adversarial Defense