Hybrid Spatio-Temporal Network for Face Forgery Detection

Xuhui Liu,Sicheng Gao,Peizhu Zhou,Jianzhuang Liu,Xiaoyan Luo,Luping Zhang,Baochang Zhang
DOI: https://doi.org/10.1007/978-3-031-47665-5_21
2023-01-01
Abstract:Facial manipulation techniques have aroused increasing security concerns, leading to various methods to detect forgery videos. However, existing methods suffer from a significant performance gap compared to image manipulation methods, partially because the spatio-temporal information is not well explored. To address the issue, we introduce a Hybrid Spatio-Temporal Network (HSTNet) to integrate spatial and temporal information in the same framework. Specifically, our HSTNet utilizes a hybrid architecture, which consists of a 3D CNN branch and a transformer branch, to jointly learn short- and long-range relations in the spatio-temporal dimension. Due to the feature misalignment between the two branches, we design a Feature Alignment Block (FAB) to recalibrate and efficiently fuse heterogeneous features. Moreover, HSTNet introduces a Vector Selection Block (VSB) to combine the outputs of the two branches and fire important features for classification. Extensive experiments show that HSTNet obtains the best overall performance over state-of-the-art methods.
What problem does this paper attempt to address?