Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu,Zichang Tan,Chuangchuang Tan,Yunchao Wei,Jingdong Wang,Yao Zhao
DOI: https://doi.org/10.1109/cvpr52733.2024.01024
2024-01-01
Computer Vision and Pattern Recognition
Abstract:In this paper, we study the problem of generalizable synthetic imagedetection, aiming to detect forgery images from diverse generative methods,e.g., GANs and diffusion models. Cutting-edge solutions start to explore thebenefits of pre-trained models, and mainly follow the fixed paradigm of solelytraining an attached classifier, e.g., combining frozen CLIP-ViT with alearnable linear layer in UniFD. However, our analysis shows that such a fixedparadigm is prone to yield detectors with insufficient learning regardingforgery representations. We attribute the key challenge to the lack of forgeryadaptation, and present a novel forgery-aware adaptive transformer approach,namely FatFormer. Based on the pre-trained vision-language spaces of CLIP,FatFormer introduces two core designs for the adaption to build generalizedforgery representations. First, motivated by the fact that both image andfrequency analysis are essential for synthetic image detection, we develop aforgery-aware adapter to adapt image features to discern and integrate localforgery traces within image and frequency domains. Second, we find thatconsidering the contrastive objectives between adapted image features and textprompt embeddings, a previously overlooked aspect, results in a nontrivialgeneralization improvement. Accordingly, we introduce language-guided alignmentto supervise the forgery adaptation with image and text prompts in FatFormer.Experiments show that, by coupling these two designs, our approach tuned on4-class ProGAN data attains a remarkable detection performance, achieving anaverage of 98diffusion models with 95