GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Chih-Chung Hsu,Shao-Ning Chen,Mei-Hsuan Wu,Yi-Fang Wang,Chia-Ming Lee,Yi-Shiuan Chou
2024-09-02
Abstract:As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at <a class="link-external link-https" href="https://github.com/ming053l/GRACE" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem this paper attempts to address is the lack of robustness and accuracy of current DeepFake video detection techniques when dealing with noisy face sequences that contain a large number of invalid face images. Specifically, due to the potential impact of adversarial attacks or video compression on face detectors, the face detection results may be inaccurate, resulting in many invalid face images. These invalid face images introduce unexpected feature jitter in the temporal domain, severely affecting the performance of DeepFake video detection techniques. To tackle this challenge, the paper proposes a novel method—Graph-Regularized Attentive Convolutional Entanglement (GRACE), which combines Graph Convolutional Networks (GCN) and Graph Laplacian Smoothing Prior Regularization (GLSPR), along with Sparsity Constraint (SC), to improve the robustness and accuracy of DeepFake video detection in noisy face sequences. The main contributions include: 1. **Proposing GRACE**: Utilizing spatiotemporal contextual features to enhance the robustness of DeepFake video detection in noisy face sequences. 2. **Introducing Feature Entanglement (FE) Mechanism**: Constructing an affinity matrix to fuse spatiotemporal features, ensuring that each node has at least one feature descriptor from a valid face image. 3. **Proposing Graph Laplacian Smoothing Prior Regularization (GLSPR) and Sparsity Constraint (SC)**: Further filtering out noisy nodes to enhance the performance of DeepFake video detection. Through these innovations, GRACE achieves state-of-the-art DeepFake video detection performance in noisy and unreliable face sequences.