SA$$^3$$WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery Detection

Yihui Li,Yifan Zhang,Hongyu Yang,Binghui Chen,Di Huang
DOI: https://doi.org/10.1007/s11263-024-02091-x
2024-01-01
Abstract:Face forgery detection (FFD) on digital images has become increasingly challenging with the proliferation of sophisticated manipulation techniques. In this study, we propose a novel approach, named Adaptive Wavelet-based Transformer with Self-paced Auto Augmentation (SA ^3 WT), which naturally combines the global representation capabilities of visual transformers with adaptive enhancement of fine-grained artifacts in the frequency domain to effectively capture forgery patterns. In particular, to adequately handle various clues, the network incorporates Wavelet-based Mixed Attention (WMA) Transformer block to better leverage the information residing in all frequency sub-bands and a Residual Reserve Fine-grained Sampler (RRFS) to enhance detailed forgery artifacts while learning hierarchical global representations. By deeply mixing the modeling processes of global representations and fine-grained features throughout the network, the model captures rich forgery clues while simultaneously bypassing the fusion issue arising from their separate extraction. Furthermore, Self-paced Auto Augmentation Strategy (SAAS) facilitates model learning by unifying data augmentation and active learning in a coupled manner. Extensive experiments conducted on several benchmarks demonstrate the superiority of SA ^3 WT compared to state-of-the-art methods. The ablation studies and cross-dataset evaluations confirm the significance of the specifically designed modules, in terms of both effectiveness and generalization. Our findings suggest that the pure visual transformers also provide a promising direction for advanced forgery detection in real-world scenarios.
What problem does this paper attempt to address?