Towards General Visual-Linguistic Face Forgery Detection.

Ke Sun,Shen Chen,Taiping Yao,Haozhe Yang,Xiaoshuai Sun,Shouhong Ding,Rongrong Ji
DOI: https://doi.org/10.48550/arxiv.2307.16545
2023-01-01
Abstract:Deepfakes are realistic face manipulations that can pose serious threats tosecurity, privacy, and trust. Existing methods mostly treat this task as binaryclassification, which uses digital labels or mask signals to train thedetection model. We argue that such supervisions lack semantic information andinterpretability. To address this issues, in this paper, we propose a novelparadigm named Visual-Linguistic Face Forgery Detection(VLFFD), which usesfine-grained sentence-level prompts as the annotation. Since text annotationsare not available in current deepfakes datasets, VLFFD first generates themixed forgery image with corresponding fine-grained prompts via Prompt ForgeryImage Generator (PFIG). Then, the fine-grained mixed data and coarse-grainedoriginal data and is jointly trained with the Coarse-and-Fine Co-trainingframework (C2F), enabling the model to gain more generalization andinterpretability. The experiments show the proposed method improves theexisting detection models on several challenging benchmarks. Furthermore, wehave integrated our method with multimodal large models, achieving noteworthyresults that demonstrate the potential of our approach. This integration notonly enhances the performance of our VLFFD paradigm but also underscores theversatility and adaptability of our method when combined with advancedmultimodal technologies, highlighting its potential in tackling the evolvingchallenges of deepfake detection.
What problem does this paper attempt to address?