Invisible Backdoor Attacks on Diffusion Models

Sen Li,Junchi Ma,Minhao Cheng

2024-06-03

Abstract:In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of diffusion models to backdoor attacks, enabling the generation of specific target images through corresponding triggers. However, prevailing backdoor attack methods rely on manually crafted trigger generation functions, often manifesting as discernible patterns incorporated into input noise, thus rendering them susceptible to human detection. In this paper, we present an innovative and versatile optimization framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors. Our proposed framework is applicable to both unconditional and conditional diffusion models, and notably, we are the pioneers in demonstrating the backdooring of diffusion models within the context of text-guided image editing and inpainting pipelines. Moreover, we also show that the backdoors in the conditional generation can be directly applied to model watermarking for model ownership verification, which further boosts the significance of the proposed framework. Extensive experiments on various commonly used samplers and datasets verify the efficacy and stealthiness of the proposed framework. Our code is publicly available at <a class="link-external link-https" href="https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily aims to address the following issues: 1. **Invisible Backdoor Attack**: Existing backdoor attack methods for diffusion models rely on manually designed trigger generation functions, which typically embed conspicuous patterns into the input noise, making them easily detectable by humans. Therefore, this paper proposes an innovative optimization framework to obtain invisible triggers, enhancing the stealth and robustness of the inserted backdoors. 2. **Applicable to Different Types of Diffusion Models**: This framework is not only applicable to unconditional diffusion models but also to conditional diffusion models. It demonstrates the possibility of conducting backdoor attacks on diffusion models within text-guided image editing and restoration pipelines. 3. **Model Watermarking**: The researchers further demonstrate that invisible backdoors can be applied to model watermarking techniques to verify model ownership. By inserting invisible backdoors as watermarks into the model, it is possible to verify whether a model originates from a protected model without exposing internal information. In summary, this paper aims to develop a more covert and powerful backdoor attack method to reveal the security vulnerabilities of diffusion models and proposes a new approach to using invisible backdoors for model copyright protection.

Invisible Backdoor Attacks on Diffusion Models

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

How to Backdoor Diffusion Models?

Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

Diffusion Models for Imperceptible and Transferable Adversarial Attack

Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

A stealthy and robust backdoor attack via frequency domain transform

TrojanEdit: Backdooring Text-Based Image Editing Models

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model