Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework.

Wenxuan Guo,Shuo Du,Huiyuan Deng,Zikang Yu,Lin Feng
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191479
2023-01-01
Abstract:With the rapid development of facial tampering techniques, the deepfake detection task has attracted widespread social concerns. Most existing video-based methods adopt temporal convolution to learn temporal discontinuities directly, where they might neglect to explore both local detail mutation and inconsistent global expression semantics in the temporal dimension. This makes it difficult to learn more discriminative forgery cues. To mitigate this issue, we introduce a novel deepfake video detection framework specifically designed to capture fine-grained traces of tampering. Concretely, we first present a Multilayered Feature Extraction module (MFE) that constructs comprehensive spatio-temporal representations by stitching different levels of features together. Afterward, we propose a Bidirectional temporal Artifact Enhancement module (BAE), which exploits local differences between adjacent frames to enhance frame-level features. Moreover, we present a Cross temporal Stride Aggregation strategy (CSA) to mine inconsistent global semantics and adaptively obtain multi-timescale representations. Extensive experiments on several benchmarks demonstrate that the proposed method outperforms state-of-the-art performance compared to other competitive approaches.
What problem does this paper attempt to address?