Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Jongwook Choi,Taehoon Kim,Yonghyun Jeong,Seungryul Baek,Jongwon Choi

2024-05-20

Abstract:This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the key challenges in deepfake video detection. With the advancement of generation algorithms, high-quality deepfake videos are becoming increasingly difficult to distinguish from real videos, raising social concerns. Existing deepfake detection methods mainly rely on visual and temporal artifacts, but these methods show performance degradation when facing the latest generation algorithms. Therefore, the paper proposes a new method to detect fake videos by analyzing the temporal behavior of style latent vectors in generated videos. Specifically, the paper finds that generated facial videos exhibit significant unnatural phenomena in the temporal changes of style latent vectors, which is inevitable when generating stable videos with multiple facial expressions and geometric transformations. The proposed method utilizes the StyleGRU module, which is trained through contrastive learning to represent the dynamic characteristics of style latent vectors. It also introduces a style attention module that combines the features generated by StyleGRU with content-based features to detect visual and temporal artifacts. Experimental results show that this method performs well in deepfake detection tasks across datasets and operational scenarios, validating that leveraging the temporal changes of style latent vectors can improve the generalization ability of detection.

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space

Restore DeepFakes Video Frames Via Identifying Individual Motion Styles

Jointly learning and training: using style diversification to improve domain generalization for deepfake detection

Video Detection Method Based on Temporal and Spatial Foundations for Accurate Verification of Authenticity

Spatio-temporal Features for Generalized Detection of Deepfake Videos

Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption

DeepFake Detection by Analyzing Convolutional Traces

MCS-GAN: A Different Understanding for Generalization of Deep Forgery Detection

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model

Analyzing temporal coherence for deepfake video detection

Exploiting Complementary Dynamic Incoherence for DeepFake Video Detection

GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Unearthing Common Inconsistency for Generalisable Deepfake Detection

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection

Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection

Selective Domain-Invariant Feature for Generalizable Deepfake Detection

Coherent Adversarial Deepfake Video Generation

Dual-Modality Co-Learning for Unveiling Deepfake in Spatio-Temporal Space.