Abstract:Nowadays, Deepfake videos are widely spread over the Internet, which severely impairs the public trustworthiness and social security. Although more and more reliable detectors have recently sprung up for resisting against that new-emerging tampering technique, some challengeable issues still need to be addressed, such that most of Deepfake video detectors under the framework of the supervised mechanism require a large scale of samples with accurate labels for training. When the amount of the training samples with the true labels are not enough or the training data are maliciously poisoned by adversaries, the supervised classifier is probably not reliable for detection. To tackle that tough issue, it is proposed to design a fully unsupervised Deepfake detector. In particular, in the whole procedure of training or testing, we have no idea of any information about the true labels of samples. First, we novelly design a pseudo-label generator for labeling the training samples, where the traditional hand-crafted features are used to characterize both types of samples. Second, the training samples with the pseudo-labels are fed into the proposed enhanced contrastive learner, in which the discriminative features are further extracted and continually refined by iteration on the guidance of the contrastive loss. Last, relying on the inter-frame correlation, we complete the final binary classification between real and fake videos. A large scale of experimental results empirically verify the effectiveness of our proposed unsupervised Deepfake detector on the benchmark datasets including FF++, Celeb-DF, DFD, DFDC, and UADFV. Furthermore, our proposed well-performed detector is superior to the current unsupervised method, and comparable to the baseline supervised methods. More importantly, when facing the problem of the labeled data poisoned by malicious adversaries or insufficient data for training, our proposed unsupervised Deepfake detector performs its powerful superiority. Our source codes have been released at https://github.com/bestalllen/Unsupervised_DF_Detection/.

Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video Detection

Hierarchical Supervisions with Two-Stream Network for Deepfake Detection.

Deepfake Video Detection with Spatiotemporal Dropout Transformer

Fully Unsupervised Deepfake Video Detection via Enhanced Contrastive Learning

Refining Localized Attention Features with Multi-Scale Relationships for Enhanced Deepfake Detection in Spatial-Frequency Domain

Spatio-temporal Features for Generalized Detection of Deepfake Videos

Detecting Deepfake Videos Based on Spatiotemporal Attention and Convolutional LSTM

Deep Convolutional Pooling Transformer for Deepfake Detection

Deepfake Detection Using Spatiotemporal Transformer

Dynamic Difference Learning with Spatio-temporal Correlation for Deepfake Video Detection

Self-Supervised Graph Transformer for Deepfake Detection

DeepFake detection algorithm based on improved vision transformer

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection

Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption

Self-supervised Transformer for Deepfake Detection

Spatiotemporal Inconsistency Learning for DeepFake Video Detection

Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

Video Detection Method Based on Temporal and Spatial Foundations for Accurate Verification of Authenticity

Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection