LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

Yunpeng Luo,Junlong Du,Ke Yan,Shouhong Ding
2024-08-19
Abstract:The evolution of Diffusion Models has dramatically improved image generation quality, making it increasingly difficult to differentiate between real and generated images. This development, while impressive, also raises significant privacy and security concerns. In response to this, we propose a novel Latent REconstruction error guided feature REfinement method (LaRE^2) for detecting the diffusion-generated images. We come up with the Latent Reconstruction Error (LaRE), the first reconstruction-error based feature in the latent space for generated image detection. LaRE surpasses existing methods in terms of feature extraction efficiency while preserving crucial cues required to differentiate between the real and the fake. To exploit LaRE, we propose an Error-Guided feature REfinement module (EGRE), which can refine the image feature guided by LaRE to enhance the discriminativeness of the feature. Our EGRE utilizes an align-then-refine mechanism, which effectively refines the image feature for generated-image detection from both spatial and channel perspectives. Extensive experiments on the large-scale GenImage benchmark demonstrate the superiority of our LaRE^2, which surpasses the best SoTA method by up to 11.9%/12.1% average ACC/AP across 8 different image generators. LaRE also surpasses existing methods in terms of feature extraction cost, delivering an impressive speed enhancement of 8 times.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of distinguishing between images generated by diffusion models and real images. With the development of diffusion models, the quality of generated images has significantly improved, making it increasingly difficult to differentiate between real and generated images. While this progress is impressive, it also raises significant privacy and security concerns. For example, the dissemination of toxic content and misinformation through these generated images could pose a threat to society and mislead the public. To tackle this challenge, the authors propose a novel method based on Latent Reconstruction Error (LaRE2) for detecting diffusion-generated images. Specifically, LaRE2 introduces Latent Reconstruction Error (LaRE) as a new feature extraction method and combines it with an Error-Guided Feature Refinement Module (EGRE) to enhance the discriminative power of the features. This approach not only surpasses existing methods in feature extraction efficiency but also excels in retaining critical information to distinguish between real and generated images. ### Main Contributions 1. **Novel Features**: For the first time, reconstruction error in the latent space is proposed as a feature for detecting generated images. Compared to existing methods, this approach significantly reduces the cost of feature extraction while retaining the essential information needed to detect diffusion-generated images. 2. **Novel Module**: Through qualitative analysis of the effectiveness of reconstruction error, a new Error-Guided Feature Refinement Module (EGRE) is proposed. This module refines features from both spatial and channel perspectives, enhancing the discriminative power of image features. 3. **Superior Performance**: Extensive experiments demonstrate that LaRE2 achieves significant performance improvements in large-scale GenImage benchmarks, with average accuracy (ACC) and average precision (AP) increasing by 11.9% and 12.1%, respectively, compared to the best existing methods. ### Method Overview 1. **Latent Reconstruction Error (LaRE)**: - Based on the hypothesis: generated images are more easily reconstructed at each reverse step in the diffusion model. - Directly calculates latent reconstruction error through single-step denoising rather than fully reconstructing the image through multi-step denoising, significantly improving efficiency. - Reconstruction is performed in the latent space, further enhancing efficiency. 2. **Error-Guided Feature Refinement Module (EGRE)**: - Aligns LaRE with the image feature map through spatial alignment, then refines image features from both spatial and channel perspectives using LaRE. - Spatial Refinement Module (ESR): reweights the attention scores of dot-product attention through an error-guided spatial attention mechanism, emphasizing important spatial information. - Channel Refinement Module (ECR): refines features from the channel perspective through a gating mechanism. ### Experimental Results - **Dataset and Evaluation Metrics**: The GenImage dataset is used, containing 2,681,167 images, divided into 1,331,167 real images and 1,350,000 generated images. The generated images come from 8 different generative models. - **Performance Comparison**: Performance comparison on the GenImage test set shows that LaRE2 achieves the best performance across all 8 generators, with average accuracy (ACC) and average precision (AP) increasing by 11.9% and 12.1%, respectively. - **Cross-Generator Image Classification**: When training the model using a subset and testing on all 8 subsets, results show that LaRE2 performs more robustly and generalizes better when handling unseen generators. ### Conclusion LaRE2 effectively addresses the detection of diffusion-generated images by introducing latent reconstruction error and an error-guided feature refinement module. This method not only has advantages in feature extraction efficiency but also significantly outperforms existing methods in detection performance.