Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

Chun-Yen Shih,Li-Xuan Peng,Jia-Wei Liao,Ernie Chu,Cheng-Fu Chou,Jun-Cheng Chen
2024-08-22
Abstract:Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attacking framework with a feature representation attack loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of protected images. Extensive experiments demonstrate the effectiveness of our approach in attacking dominant PDM-based editing methods (e.g., SDEdit) while maintaining reasonable protection fidelity and robustness against common defense methods. Additionally, our framework is extensible to LDMs, achieving comparable performance to existing approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of effective attack methods for Pixel - domain Diffusion Models (PDMs). Specifically, although previous research has proposed some protection methods for Latent Diffusion Models (LDMs), these methods mainly rely on attacks on the latent encoder, and PDMs do not have such an encoder, so these methods are difficult to be directly applied to PDMs. In addition, PDMs themselves have high robustness to pixel - domain perturbations, making traditional attack methods ineffective on PDMs. The main goal of the paper is to design a new attack framework that can effectively attack PDMs while maintaining the naturality of adversarial images and robustness against traditional defense methods. Through this framework, researchers hope to distort the editing results generated by PDMs or make them unrelated to the original input without significantly reducing the image fidelity, thereby protecting the image from unauthorized editing. ### Main Contributions 1. **Proposed a new attack framework for PDMs**: This framework has reached the current best level in attack performance, especially when using SDEdit for editing, it can effectively protect images. 2. **Designed a new feature - attack loss function**: This loss function can effectively interfere with the feature representation in UNet, so that the model cannot correctly recognize the semantics of the image. 3. **Proposed a latent - space optimization strategy based on model - agnostic VAE**: This strategy further enhances the naturality of adversarial images by optimizing perturbations in the latent space, making them closer to the original image. ### Method Overview - **Threat model and problem setting**: Researchers defined a scenario where a malicious user uses SDEdit to perform unauthorized editing on an image, and proposed a method of generating adversarial images by adding imperceptible perturbations to disrupt the reverse diffusion process of SDEdit. - **Attack loss and fidelity constraint**: Two loss functions are introduced, namely attack loss (used to interfere with the feature representation in UNet) and fidelity loss (used to control the quality of adversarial images). - **Alternating optimization**: By optimizing in the latent space, the adversarial image is gradually updated to ensure that it has both attack effects and high fidelity. - **Latent - space optimization strategy**: Use a pre - trained Variational Auto - Encoder (VAE) to transform the image into the latent space for optimization, and then decode it back to the pixel space to generate the final protected image. ### Experimental Results - **Attack effectiveness**: The experimental results show that this method is superior to the existing PGD - based methods in both adversarial image quality and attack effectiveness. - **Robustness**: This method has strong robustness against common defense methods (such as cropping and scaling, JPEG compression), and even under these defense methods, the attack effect of adversarial images is still significant. In general, this paper fills the gap in the PDMs attack field and provides an effective and practical solution, which helps to protect images from unauthorized editing.