Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Yunfeng Diao,Naixin Zhai,Changtao Miao,Xun Yang,Meng Wang
2024-07-30
Abstract:Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of these AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. For the task of AIGI detection, we propose a new attack containing two main parts. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous models, e.g. transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as frequency-based post-train Bayesian attack, or FPBA. Through FPBA, we show that adversarial attack is truly a real threat to AIGI detectors, because FPBA can deliver successful black-box attacks across models, generators, defense methods, and even evade cross-generator detection, which is a crucial real-world detection scenario.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
The paper attempts to address the issue of adversarial attack vulnerability in AI-generated image detection. Specifically: 1. **Background and Motivation**: - In recent years, the development of image synthesis technology (especially GANs and diffusion models) has raised public concerns about the spread of misinformation. - To address this issue, many AI-generated image (AIGI) detectors have been proposed and have achieved significant performance in identifying fake images. - However, there is currently a lack of systematic research on the robustness of these AIGI detectors under adversarial attacks. 2. **Research Objectives**: - This paper aims to systematically evaluate the vulnerability of state-of-the-art AIGI detectors to adversarial attacks in both white-box and black-box settings. - The authors propose a new adversarial attack method called Frequency-based Post-train Bayesian Attack (FPBA) to demonstrate the real threat of adversarial attacks to AIGI detectors. 3. **Main Contributions**: - **Systematic Evaluation**: For the first time, a systematic evaluation of the robustness of state-of-the-art AIGI detectors under adversarial attacks is conducted, including traditional training models, defense models, and cross-generation detection in real-world scenarios. - **New Attack Method**: A new adversarial attack method, FPBA, is proposed. By adding perturbations in the frequency domain and exploring the full posterior distribution of the surrogate model from a post-train Bayesian perspective, the success rate of adversarial attacks is improved. - **Experimental Validation**: Extensive experiments are conducted on multiple datasets, showing that FPBA achieves the highest average attack success rate in both white-box and black-box settings, significantly outperforming baseline methods. 4. **Method Overview**: - **Frequency Domain Analysis**: Discrete Cosine Transform (DCT) is used to convert input images from the spatial domain to the frequency domain. The spectrum saliency map visualizes the differences between real and fake images. - **Frequency Domain Attack**: Perturbations are added in the frequency domain to make the image deviate from its original frequency distribution, thereby misleading the detector. - **Post-train Bayesian Strategy**: A post-train Bayesian strategy is proposed to simulate various victim models without retraining the surrogate model, further enhancing the transferability of adversarial attacks. - **Hybrid Attack**: Combining attack gradients from both the spatial and frequency domains further improves the transferability of adversarial attacks across different domains. Through these methods, the authors demonstrate the real threat of adversarial attacks to AIGI detectors and provide potential directions for improving the robustness of detectors.