Abstract:Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attacking framework with a feature representation attack loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of protected images. Extensive experiments demonstrate the effectiveness of our approach in attacking dominant PDM-based editing methods (e.g., SDEdit) while maintaining reasonable protection fidelity and robustness against common defense methods. Additionally, our framework is extensible to LDMs, achieving comparable performance to existing approaches.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the lack of effective attack methods for Pixel - domain Diffusion Models (PDMs). Specifically, although previous research has proposed some protection methods for Latent Diffusion Models (LDMs), these methods mainly rely on attacks on the latent encoder, and PDMs do not have such an encoder, so these methods are difficult to be directly applied to PDMs. In addition, PDMs themselves have high robustness to pixel - domain perturbations, making traditional attack methods ineffective on PDMs. The main goal of the paper is to design a new attack framework that can effectively attack PDMs while maintaining the naturality of adversarial images and robustness against traditional defense methods. Through this framework, researchers hope to distort the editing results generated by PDMs or make them unrelated to the original input without significantly reducing the image fidelity, thereby protecting the image from unauthorized editing. ### Main Contributions 1. **Proposed a new attack framework for PDMs**: This framework has reached the current best level in attack performance, especially when using SDEdit for editing, it can effectively protect images. 2. **Designed a new feature - attack loss function**: This loss function can effectively interfere with the feature representation in UNet, so that the model cannot correctly recognize the semantics of the image. 3. **Proposed a latent - space optimization strategy based on model - agnostic VAE**: This strategy further enhances the naturality of adversarial images by optimizing perturbations in the latent space, making them closer to the original image. ### Method Overview - **Threat model and problem setting**: Researchers defined a scenario where a malicious user uses SDEdit to perform unauthorized editing on an image, and proposed a method of generating adversarial images by adding imperceptible perturbations to disrupt the reverse diffusion process of SDEdit. - **Attack loss and fidelity constraint**: Two loss functions are introduced, namely attack loss (used to interfere with the feature representation in UNet) and fidelity loss (used to control the quality of adversarial images). - **Alternating optimization**: By optimizing in the latent space, the adversarial image is gradually updated to ensure that it has both attack effects and high fidelity. - **Latent - space optimization strategy**: Use a pre - trained Variational Auto - Encoder (VAE) to transform the image into the latent space for optimization, and then decode it back to the pixel space to generate the final protected image. ### Experimental Results - **Attack effectiveness**: The experimental results show that this method is superior to the existing PGD - based methods in both adversarial image quality and attack effectiveness. - **Robustness**: This method has strong robustness against common defense methods (such as cropping and scaling, JPEG compression), and even under these defense methods, the attack effect of adversarial images is still significant. In general, this paper fills the gap in the PDMs attack field and provides an effective and practical solution, which helps to protect images from unauthorized editing.

Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than We Think

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

Diffusion Models for Imperceptible and Transferable Adversarial Attack

Toward effective protection against diffusion based mimicry through score distillation

Raising the Cost of Malicious AI-Powered Image Editing

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now