Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think

Haotian Xue,Yongxin Chen

2024-05-02

Abstract:Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adversarial patterns that were generated on LDMs to protect the images, which means that most protection methods nowadays, to some extent, cannot protect our images from malicious attacks. We hope that our insights will inspire the community to rethink the adversarial samples for diffusion models as protection methods and move forward to more effective protection. Codes are available in

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the vulnerability of diffusion models in adversarial attacks. Specifically, existing research mainly focuses on Latent Diffusion Models (LDMs), while ignoring the adversarial attack research on Pixel - Space Diffusion Models (PDMs). Through a large number of experiments, the author found that the existing adversarial attack methods are almost ineffective against PDMs, which indicates that PDMs have stronger adversarial attack robustness than LDMs. Based on this finding, the paper proposes a new framework named PDM - Pure, which uses PDMs as a general purifier to eliminate the protective perturbations generated by LDMs, thus challenging the current assumptions about the security protection of generative diffusion models. The main contributions of the paper include: 1. Pointing out that existing adversarial example research mainly focuses on LDMs, and the adversarial attack research on PDMs has been seriously neglected. 2. Through extensive experimental verification, it is proved that the existing adversarial attack methods are ineffective against PDMs, revealing that PDMs have stronger adversarial attack robustness. 3. Proposing the PDM - Pure framework, using powerful PDMs as a general purifier to effectively remove protective perturbations at different scales, which is significantly better than existing purification methods. These findings and contributions are of great significance for re - evaluating the adversarial examples of diffusion models and their applications in protecting against unauthorized image editing.

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think

Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Adversarial Examples are Misaligned in Diffusion Model Manifolds

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

Diffusion Models for Imperceptible and Transferable Adversarial Attack

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Toward effective protection against diffusion based mimicry through score distillation

Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

Robust Diffusion Models for Adversarial Purification

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

Struggle with Adversarial Defense? Try Diffusion

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Mist: Towards Improved Adversarial Examples for Diffusion Models

Mitigating Adversarial Attacks in Object Detection through Conditional Diffusion Models