Exposing Deepfake Videos with Spatial, Frequency and Multi-scale Temporal Artifacts

Yongjian Hu,Hongjie Zhao,Zeqiong Yu,Beibei Liu,Xiangyu Yu
DOI: https://doi.org/10.1007/978-3-030-95398-0_4
2022-01-01
Abstract:The deepfake technique replaces the face in a source video with a fake face which is generated using deep learning tools such as generative adversarial networks (GANs). Even the facial expression can be well synchronized, making it difficult to identify the fake videos. Using features from multiple domains has been proved effective in the literature. It is also known that the temporal information is particularly critical in detecting deepfake videos, since the face-swapping of a video is implemented frame by frame. In this paper, we argue that the temporal differences between authentic and fake videos are complex and can not be adequately depicted from a single time scale. To obtain a complete picture of the temporal deepfake traces, we design a detection model with a short-term feature extraction module and a long-term feature extraction module. The short-term module captures the gradient information of adjacent frames. which is incorporated with the frequency and spatial information to make a multi-domain feature set. The long-term module then reveals the artifacts from a longer period of context. The proposed algorithm is tested on several popular databases, namely FaceForensics++, DeepfakeDetection (DFD), TIMIT-DF and FFW. Experimental results have validated the effectiveness of our algorithm through improved detection performance compared with related works.
What problem does this paper attempt to address?