Invisible Backdoor Attacks on Diffusion Models

Sen Li,Junchi Ma,Minhao Cheng
2024-06-03
Abstract:In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of diffusion models to backdoor attacks, enabling the generation of specific target images through corresponding triggers. However, prevailing backdoor attack methods rely on manually crafted trigger generation functions, often manifesting as discernible patterns incorporated into input noise, thus rendering them susceptible to human detection. In this paper, we present an innovative and versatile optimization framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors. Our proposed framework is applicable to both unconditional and conditional diffusion models, and notably, we are the pioneers in demonstrating the backdooring of diffusion models within the context of text-guided image editing and inpainting pipelines. Moreover, we also show that the backdoors in the conditional generation can be directly applied to model watermarking for model ownership verification, which further boosts the significance of the proposed framework. Extensive experiments on various commonly used samplers and datasets verify the efficacy and stealthiness of the proposed framework. Our code is publicly available at <a class="link-external link-https" href="https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Invisible Backdoor Attack**: Existing backdoor attack methods for diffusion models rely on manually designed trigger generation functions, which typically embed conspicuous patterns into the input noise, making them easily detectable by humans. Therefore, this paper proposes an innovative optimization framework to obtain invisible triggers, enhancing the stealth and robustness of the inserted backdoors. 2. **Applicable to Different Types of Diffusion Models**: This framework is not only applicable to unconditional diffusion models but also to conditional diffusion models. It demonstrates the possibility of conducting backdoor attacks on diffusion models within text-guided image editing and restoration pipelines. 3. **Model Watermarking**: The researchers further demonstrate that invisible backdoors can be applied to model watermarking techniques to verify model ownership. By inserting invisible backdoors as watermarks into the model, it is possible to verify whether a model originates from a protected model without exposing internal information. In summary, this paper aims to develop a more covert and powerful backdoor attack method to reveal the security vulnerabilities of diffusion models and proposes a new approach to using invisible backdoors for model copyright protection.