Augmented Multi-Scale Spatiotemporal Inconsistency Magnifier for Generalized DeepFake Detection

Yang Yu,Xiaohui Zhao,Rongrong Ni,Siyuan Yang,Yao Zhao,Alex C. Kot
DOI: https://doi.org/10.1109/tmm.2023.3237322
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Recently, realistic DeepFake videos have raised severe security concerns in society. Existing video-based detection methods observe local spatial regions with the coarse temporal view, thus it is difficult to obtain subtle spatiotemporal information, resulting in limited generalization ability. In this paper, we propose a novel Augmented Multi-scale Spatiotemporal Inconsistency Magnifier (AMSIM) with a Global Inconsistency View (GIV) and a more meticulous Multi-timescale Local Inconsistency View (MLIV), focusing on mining comprehensive and more subtle spatiotemporal cues. Firstly, the GIV that includs the global spatial and long-term temporal views is established to ensure comprehensive spatiotemporal clues are captured. Then, the MLIV with the critical local spatial and multi-timescale local temporal views is designed for magnifying the indetectable spatiotemporal abnormality. Subsequently, GIV is utilized to guide MLIV to dynamically find local spatiotemporal anomalies that are highly relevant to the overall video. Finally, to further obtain a generalized framework, the adversarial data augmentation is specially designed to expand source domains and simulate unseen forgery domains. Extensive experiments on six large-scale datasets show that our AMSIM outperforms state-of-the-art detection methods and remains effective when applied to unseen forgery techniques and datasets.
What problem does this paper attempt to address?