Unified Video and Image Representation for Boosted Video Face Forgery Detection

Haotian Liu,Chenhui Pan,Yang Liu,Guoying Zhao,Xiaobai Li
DOI: https://doi.org/10.3233/faia240548
2024-01-01
Abstract:Face forgery detection is crucial in preserving the security and integrity of facial data amidst the rapid developments in face manipulation techniques and deep generative models. Existing methods for video face forgery detection typically assume that all frames in a forged video are manipulated, while identifying partially forged videos with only a subset of altered frames is still a challenge to be solved. To address this issue, we propose a novel framework, i.e., the UVIF, that utilizes additional annotated images to provide fine-grained supervision for detecting partial forgeries in videos. The UVIF integrates a unified encoder and a multi-task learning paradigm to model both facial videos and images for boosted video face forgery detection. A 2D backbone with temporal fusion modules is employed for the unified encoder. A pseudo labeling process is also designed for facial video frames to bridge the representation of individual video frames and static images. Extensive experiments on benchmark datasets demonstrate the effectiveness of our framework, outperforming state-of-the-art methods in detecting partially forged videos while introducing no additional computational overhead. Our code is available at https://github.com/haotianll/UVIF.
What problem does this paper attempt to address?