Abstract:As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative models fingerprints while ignoring image properties such as semantic content, resolution, file format, etc. Fake image detectors are usually built in a data driven way, where a model is trained to separate real from fake images. Existing works primarily investigate network architecture choices and training recipes. In this work, we argue that in addition to these algorithmic choices, we also require a well aligned dataset of real/fake images to train a robust detector. For the family of LDMs, we propose a very simple way to achieve this: we reconstruct all the real images using the LDMs autoencoder, without any denoising operation. We then train a model to separate these real images from their reconstructions. The fakes created this way are extremely similar to the real ones in almost every aspect (e.g., size, aspect ratio, semantic content), which forces the model to look for the LDM decoders artifacts. We empirically show that this way of creating aligned real/fake datasets, which also sidesteps the computationally expensive denoising process, helps in building a detector that focuses less on spurious correlations, something that a very popular existing method is susceptible to. Finally, to demonstrate just how effective the alignment in a dataset can be, we build a detector using images that are not natural objects, and present promising results. Overall, our work identifies the subtle but significant issues that arise when training a fake image detector and proposes a simple and inexpensive solution to address these problems.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the key problems in **fake image detection**. With the rapid development of generative models (especially Latent Diffusion Models, LDMs), forged images are becoming more and more realistic, so there is an urgent need for an effective method to distinguish between real and fake images. #### Specific problems: 1. **Limitations of existing methods**: - Existing fake image detection methods mainly focus on network architecture selection and training strategies, but ignore the design of training datasets. - Many existing fake image detectors are prone to learning spurious correlations, such as image resolution, file format, etc., rather than the unique fingerprints of generative models. 2. **Dataset alignment problem**: - When training a fake image detector, the alignment between real and fake images is very important. If there are significant differences between them (such as different resolutions), the detector may rely on these differences rather than the fingerprints of the generative model for classification. 3. **Computational efficiency**: - Generating fake images using the complete denoising process is very time - consuming and computationally expensive. For example, traditional latent diffusion models require multiple forward passes and decoding operations. #### Solutions: - **Propose a simple and efficient method**: Generate fake images by using the autoencoder of LDM to reconstruct real images without denoising operations. - **Ensure dataset alignment**: The generated fake images are very similar to real images in almost all aspects (such as size, aspect ratio, semantic content, etc.), which forces the detector to focus on the artifacts introduced by the decoder of the generative model. - **Improve the robustness of the detector**: Training the detector with an aligned dataset can reduce its dependence on spurious correlations and make it more focused on the real features of the generative model. - **Improve computational efficiency**: Compared with traditional methods, this method only requires one forward pass, significantly reducing the computational cost and improving the efficiency by more than 10 times. #### Experimental results: - Through experimental verification, this method can not only effectively detect fake images generated by various latent diffusion models, but also avoid learning spurious correlations, while significantly improving computational efficiency. In summary, the core objective of this paper is to improve the dataset design so that the fake image detector pays more attention to the features of the generative model rather than other properties of the image, and on this basis, improve the accuracy and robustness of the detector.

On the Effectiveness of Dataset Alignment for Fake Image Detection

DA-FDFtNet: Dual Attention Fake Detection Fine-tuning Network to Detect Various AI-Generated Fake Images

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Detecting AutoEncoder is Enough to Catch LDM Generated Images

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

Let Real Images be as a Judger, Spotting Fake Images Synthesized with Generative Models

FDFtNet: Facing Off Fake Images using Fake Detection Fine-tuning Network

Towards Universal Fake Image Detectors that Generalize Across Generative Models

Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models

Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis

Deep Fake Image Detection Based on Pairwise Learning

Community Forensics: Using Thousands of Generators to Train Fake Image Detectors

Harnessing Machine Learning for Discerning AI-Generated Synthetic Images

Design of Automated Deep Learning-Based Fusion Model for Copy-Move Image Forgery Detection

Image forgery detection: a survey of recent deep-learning approaches

Detect Fake with Fake: Leveraging Synthetic Data-driven Representation for Synthetic Image Detection

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

Comparative analysis of GAN-based fusion deep neural models for fake face detection

Exploring varying color spaces through representative forgery learning to improve deepfake detection

DeepFake Detection with Inconsistent Head Poses: Reproducibility and Analysis