EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second

Hao Wang,Shangwei Guo,Jialing He,Kangjie Chen,Shudong Zhang,Tianwei Zhang,Tao Xiang
DOI: https://doi.org/10.1145/3664647.3680689
2024-01-01
Abstract:Text-to-image (T2I) diffusion models enjoy great popularity and many individuals and companies build their applications based on publicly released T2I diffusion models. Previous studies have demonstrated that backdoor attacks can elicit T2I diffusion models to generate unsafe target images through textual triggers. However, existing backdoor attacks typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance of T2I diffusion models. To address these issues, we propose EvilEdit, a training-free and data-free backdoor attack against T2I diffusion models. EvilEdit directly edits the projection matrices in the cross-attention layers to achieve projection alignment between a trigger and the corresponding backdoor target. We preserve the functionality of the backdoored model using a protected whitelist to ensure the semantic of non-trigger words is not accidentally altered by the backdoor. We also propose a visual target attack EvilEdit VTA, enabling adversaries to use specific images as backdoor targets. We conduct empirical experiments on Stable Diffusion and the results demonstrate that the EvilEdit can backdoor T2I diffusion models within one second with up to 100% success rate. Furthermore, our EvilEdit modifies only 2.2% of the parameters and maintains the model's performance on benign prompts. Our code is available at https://github.com/haowang-cqu/EvilEdit.
What problem does this paper attempt to address?