Abstract:The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection. This task is crucial for understanding the vulnerabilities of existing detection methods and developing more robust techniques. However, current adversarial attacks often introduce visible noise, have poor transferability, and fail to address spectral differences between AI-generated and genuine images. To address this, we propose StealthDiffusion, a framework based on stable diffusion that modifies AI-generated images into high-quality, imperceptible adversarial examples capable of evading state-of-the-art forensic detectors. StealthDiffusion comprises two main components: Latent Adversarial Optimization, which generates adversarial perturbations in the latent space of stable diffusion, and Control-VAE, a module that reduces spectral differences between the generated adversarial images and genuine images without affecting the original diffusion model's generation process. Extensive experiments show that StealthDiffusion is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries with frequency spectra similar to genuine images. These forgeries are classified as genuine by advanced forensic classifiers and are difficult for humans to distinguish.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to create AI - generated images that can evade existing detection methods (including forensic detectors and human visual inspection) in the context of the rapid development of Generative Adversarial Networks (GANs) and other image - generation techniques. Specifically, the paper focuses on generating high - quality forged images that are difficult to detect by improving existing adversarial attack methods, thereby revealing the vulnerabilities of current detection methods and promoting the development of more robust detection techniques. The method proposed in the paper is called StealthDiffusion. It is based on the Stable Diffusion model and generates high - quality, imperceptible adversarial samples that can effectively evade state - of - the - art forensic detectors by optimizing in the latent space of the model. StealthDiffusion mainly consists of two parts: 1. **Latent Adversarial Optimization (LAO)**: This part generates adversarial perturbations in the latent space of the Stable Diffusion model, making the generated images more realistic while maintaining their visual quality. 2. **Control - Variational Auto - Encoder (Control - VAE)**: This module aims to reduce the spectral differences between the generated adversarial images and the real images. By reconstructing the real images and the generated images and integrating this knowledge into the Stable Diffusion decoder through a skip - connection method similar to a control network, it effectively reduces spectral aliasing, making the generated images more difficult to distinguish in the spectral domain. Through extensive experiments, the researchers have proven that StealthDiffusion can effectively convert AI - generated images into high - quality adversarial forgeries in both white - box and black - box settings. The frequency spectra of these forgeries are similar to those of real images. They can not only be recognized as real images by advanced forensic classifiers but are also difficult to be distinguished by the human eye. This shows that StealthDiffusion has significant advantages in improving the transferability of image stealth adversarial attacks and the authenticity of generated images.

StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model

Warfare:Breaking the Watermark Protection of AI-Generated Content

Unveiling Universal Forensics of Diffusion Models with Adversarial Perturbations

Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model

Deceptive Diffusion: Generating Synthetic Adversarial Examples

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Diffusion Models for Imperceptible and Transferable Adversarial Attack

Adversarial Examples for Preventing Diffusion Models from Malicious Image Edition

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

Toward effective protection against diffusion based mimicry through score distillation

Data Forensics in Diffusion Models: A Systematic Analysis of Membership Privacy

Mist: Towards Improved Adversarial Examples for Diffusion Models

Diffusion-Generated Fake Face Detection by Exploring Wavelet Domain Forgery Clues.

Invisible Backdoor Attacks on Diffusion Models

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Exposing the Fake: Effective Diffusion-Generated Images Detection

Diffusion Deepfake

DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection

AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models