Abstract:Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of these AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. For the task of AIGI detection, we propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous models, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as frequency-based post-train Bayesian attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario.

What problem does this paper attempt to address?

The paper attempts to address the issue of adversarial attack vulnerability in AI-generated image detection. Specifically: 1. **Background and Motivation**: - In recent years, the development of image synthesis technology (especially GANs and diffusion models) has raised public concerns about the spread of misinformation. - To address this issue, many AI-generated image (AIGI) detectors have been proposed and have achieved significant performance in identifying fake images. - However, there is currently a lack of systematic research on the robustness of these AIGI detectors under adversarial attacks. 2. **Research Objectives**: - This paper aims to systematically evaluate the vulnerability of state-of-the-art AIGI detectors to adversarial attacks in both white-box and black-box settings. - The authors propose a new adversarial attack method called Frequency-based Post-train Bayesian Attack (FPBA) to demonstrate the real threat of adversarial attacks to AIGI detectors. 3. **Main Contributions**: - **Systematic Evaluation**: For the first time, a systematic evaluation of the robustness of state-of-the-art AIGI detectors under adversarial attacks is conducted, including traditional training models, defense models, and cross-generation detection in real-world scenarios. - **New Attack Method**: A new adversarial attack method, FPBA, is proposed. By adding perturbations in the frequency domain and exploring the full posterior distribution of the surrogate model from a post-train Bayesian perspective, the success rate of adversarial attacks is improved. - **Experimental Validation**: Extensive experiments are conducted on multiple datasets, showing that FPBA achieves the highest average attack success rate in both white-box and black-box settings, significantly outperforming baseline methods. 4. **Method Overview**: - **Frequency Domain Analysis**: Discrete Cosine Transform (DCT) is used to convert input images from the spatial domain to the frequency domain. The spectrum saliency map visualizes the differences between real and fake images. - **Frequency Domain Attack**: Perturbations are added in the frequency domain to make the image deviate from its original frequency distribution, thereby misleading the detector. - **Post-train Bayesian Strategy**: A post-train Bayesian strategy is proposed to simulate various victim models without retraining the surrogate model, further enhancing the transferability of adversarial attacks. - **Hybrid Attack**: Combining attack gradients from both the spatial and frequency domains further improves the transferability of adversarial attacks across different domains. Through these methods, the authors demonstrate the real threat of adversarial attacks to AIGI detectors and provide potential directions for improving the robustness of detectors.

Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

No One Can Escape: A General Approach to Detect Tampered and Generated Image

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

A GAN-Based Defense Framework Against Model Inversion Attacks.

Addressing Vulnerabilities in AI-Image Detection: Challenges and Proposed Solutions

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

Evading Deepfake-Image Detectors with White- and Black-Box Attacks

Defending against GAN-based Deepfake Attacks via Transformation-aware Adversarial Faces

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

Adversarial Threats to DeepFake Detection: A Practical Perspective

Exploring Adversarial Fake Images on Face Manifold

Analysis of adversarial attacks against CNN-based image forgery detectors

Restricted Black-Box Adversarial Attack Against DeepFake Face Swapping

Perception Matters: Exploring Imperceptible and Transferable Anti-forensics for GAN-generated Fake Face Imagery Detection

Online Alternate Generator against Adversarial Attacks

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

Detecting Adversarial Faces Using Only Real Face Self-Perturbations

Natural Language Induced Adversarial Images