Abstract:In recent years, the spread of fake videos has brought great influence on individuals and even countries. It is important to provide robust and reliable results for fake videos. The results of conventional detection methods are not reliable and not robust for unseen videos. Another alternative and more effective way is to find the original video of the fake video. For example, fake videos from the Russia-Ukraine war and the Hong Kong law revision storm are refuted by finding the original video. We use an improved retrieval method to find the original video, named ViTHash. Specifically, tracing the source of fake videos requires finding the unique one, which is difficult when there are only small differences in the original videos. To solve the above problems, we designed a novel loss Hash Triplet Loss. In addition, we designed a tool called Localizator to compare the difference between the original traced video and the fake video. We have done extensive experiments on FaceForensics++, Celeb-DF and DeepFakeDetection, and we also have done additional experiments on our built three datasets: DAVIS2016-TL (video inpainting), VSTL (video splicing) and DFTL (similar videos). Experiments have shown that our performance is better than state-of-the-art methods, especially in cross-dataset mode. Experiments also demonstrated that ViTHash is effective in various forgery detection: video inpainting, video splicing and deepfakes. Our code and datasets have been released on GitHub: \url{<a class="link-external link-https" href="https://github.com/lajlksdf/vtl" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the traceability of forged videos. Specifically, with the wide spread of forged videos, it has had a huge impact on individuals and even countries. Therefore, it has become particularly important to provide a robust and reliable method to identify the source of forged videos. Traditional detection methods do not work well for unseen videos, and finding the original video of the forged video is a more effective way. For example, in the Russia - Ukraine war and the Hong Kong unrest over the proposed extradition law, the practice of refuting forged videos by finding the original videos has proven its effectiveness. To meet the above challenges, the author proposes a video hash retrieval method based on Vision Transformer (ViT), named ViTHash. This method aims to track the original video of the forged video by generating a unique hash code, even if there are only slight differences between these videos. To this end, the author designs a new loss function - Hash Triplet Loss, and a tool named Localizator, which is used to compare the differences between the tracked original video and the forged video. The main contributions of the paper include: 1. **Novel architecture design**: A new architecture is designed to detect forged videos by tracking their sources, providing irrefutable evidence instead of outputting a possible value. 2. **Hash Triplet Loss**: A new loss function is designed, which helps to better distinguish between the original video and similar videos with only slight differences. 3. **New data set**: Due to the lack of relevant object - forged video data sets, the author constructs three data sets to verify the performance of their method in different forging scenarios. Through these innovations, ViTHash performs excellently in a variety of forgery detection tasks, especially in the cross - data set mode. Experimental results show that ViTHash outperforms existing methods in video inpainting, video splicing, and deep - fake detection.

Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos

Video Forensics Research Based on Authenticity and Integrity.

Unified Video and Image Representation for Boosted Video Face Forgery Detection

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection

FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection

UVL2: A Unified Framework for Video Tampering Localization

Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer

An Improved Video Identification Scheme Based on Video Tomography.

Deepfake Detection Using Spatiotemporal Transformer

Audio-Visual Temporal Forgery Detection Using Embedding-Level Fusion and Multi-Dimensional Contrastive Loss

G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing

UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

AVoiD-DF: Audio-Visual Joint Learning for Detecting Deepfake

Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection

Studies of epitope restriction on myeloperoxidase (MPO), an important antigen in systemic vasculitis

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

DeepFake detection algorithm based on improved vision transformer

Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Video Detection Method Based on Temporal and Spatial Foundations for Accurate Verification of Authenticity

A Timely Survey on Vision Transformer for Deepfake Detection