Abstract:Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure , designed to improve the certified robustness of a pretrained model (i.e. classifier). Given an (adversarial) input, DensePure consists of multiple runs of denoising via the reverse process of the diffusion model (with different random seeds) to get multiple reversed samples, which are then passed through the classifier, followed by majority voting of inferred labels to make the final prediction. This design of using multiple runs of denoising is informed by our theoretical analysis of the conditional distribution of the reversed sample. Specifically, when the data density of a clean sample is high, its conditional density under the reverse process in a diffusion model is also high; thus sampling from the latter conditional distribution can purify the adversarial example and return the corresponding clean sample with a high probability. By using the highest density point in the conditional distribution as the reversed sample, we identify the robust region of a given instance under the diffusion model’s reverse process. We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works. In practice, DensePure can approximate the label of the high density region in the conditional distribution so that it can enhance certified robustness. We conduct extensive experiments to demonstrate the effectiveness of DensePure by evaluating its certified robustness given a standard model via randomized smoothing. We show that DensePure is consistently better than existing methods on ImageNet, with 7% improvement on average. Project

DifFilter: Defending Against Adversarial Perturbations with Diffusion Filter

Guided Diffusion Model for Adversarial Purification

Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

D3R-Net: Denoising Diffusion-Based Defense Restore Network for Adversarial Defense in Remote Sensing Scene Classification

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

Adv-BDPM: Adversarial Attack Based on Boundary Diffusion Probability Model.

Feature decoupling and interaction network for defending against adversarial examples

DiffuseDef: Improved Robustness to Adversarial Attacks

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Adaptive Wiener Filter and Natural Noise to Eliminate Adversarial Perturbation

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

AdvFilter: Predictive Perturbation-aware Filtering Against Adversarial Attack Via Multi-domain Learning.

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Robust Diffusion Models for Adversarial Purification

Diffusion-based Adversarial Purification for Intrusion Detection

Guided Diffusion-based Adversarial Purification Model with Denoised Prior Constraint

Adversarial defense based on distribution transfer

Designing defensive techniques to handle adversarial attack on deep learning based model

DensePure: Understanding Diffusion Models for Adversarial Robustness

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction