StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

Ziyin Zhou,Ke Sun,Zhongxi Chen,Huafeng Kuang,Xiaoshuai Sun,Rongrong Ji
2024-08-11
Abstract:The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection. This task is crucial for understanding the vulnerabilities of existing detection methods and developing more robust techniques. However, current adversarial attacks often introduce visible noise, have poor transferability, and fail to address spectral differences between AI-generated and genuine images. To address this, we propose StealthDiffusion, a framework based on stable diffusion that modifies AI-generated images into high-quality, imperceptible adversarial examples capable of evading state-of-the-art forensic detectors. StealthDiffusion comprises two main components: Latent Adversarial Optimization, which generates adversarial perturbations in the latent space of stable diffusion, and Control-VAE, a module that reduces spectral differences between the generated adversarial images and genuine images without affecting the original diffusion model's generation process. Extensive experiments show that StealthDiffusion is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries with frequency spectra similar to genuine images. These forgeries are classified as genuine by advanced forensic classifiers and are difficult for humans to distinguish.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to create AI - generated images that can evade existing detection methods (including forensic detectors and human visual inspection) in the context of the rapid development of Generative Adversarial Networks (GANs) and other image - generation techniques. Specifically, the paper focuses on generating high - quality forged images that are difficult to detect by improving existing adversarial attack methods, thereby revealing the vulnerabilities of current detection methods and promoting the development of more robust detection techniques. The method proposed in the paper is called StealthDiffusion. It is based on the Stable Diffusion model and generates high - quality, imperceptible adversarial samples that can effectively evade state - of - the - art forensic detectors by optimizing in the latent space of the model. StealthDiffusion mainly consists of two parts: 1. **Latent Adversarial Optimization (LAO)**: This part generates adversarial perturbations in the latent space of the Stable Diffusion model, making the generated images more realistic while maintaining their visual quality. 2. **Control - Variational Auto - Encoder (Control - VAE)**: This module aims to reduce the spectral differences between the generated adversarial images and the real images. By reconstructing the real images and the generated images and integrating this knowledge into the Stable Diffusion decoder through a skip - connection method similar to a control network, it effectively reduces spectral aliasing, making the generated images more difficult to distinguish in the spectral domain. Through extensive experiments, the researchers have proven that StealthDiffusion can effectively convert AI - generated images into high - quality adversarial forgeries in both white - box and black - box settings. The frequency spectra of these forgeries are similar to those of real images. They can not only be recognized as real images by advanced forensic classifiers but are also difficult to be distinguished by the human eye. This shows that StealthDiffusion has significant advantages in improving the transferability of image stealth adversarial attacks and the authenticity of generated images.