Abstract:As Generative Adversarial Networks advance, deepfakes have become increasingly realistic, thereby escalating societal, economic, and political threats. In confronting these heightened risks, the research community has identified two promising defensive strategies: proactive deepfake disruption and reactive deepfake detection. Typically, proactive and reactive defenses coexist, each addressing the shortcomings of the other. However, this paper brings to the fore a critical yet overlooked issue associated with the simultaneous deployment of these deepfake countermeasures. Genuine images gathered from the Internet, already imbued with disrupting perturbations, can lead to data poisoning in the training datasets of deepfake detection models, thereby severely affecting detection accuracy. We propose an improved training framework to address this problem in deepfake detection models. Our approach involves purifying the disrupting perturbations in disruptive images using a backward process of the denoising diffusion probabilistic model (DDPM). Images purified using our DDPM-based technique closely mimic the original, unperturbed images, thereby enabling the successful generation of deepfake images for training purposes. Moreover, our purification process outperforms DiffPure, a prominent adversarial purification method, in terms of speed. While conventional defensive techniques struggle to preserve detection accuracy in the face of a poisoned training dataset, our framework markedly reduces this accuracy drop, thus achieving superior performance across a range of detection models. Our experiments demonstrate that deepfake detection models trained using our framework exhibit an increase in detection accuracy ranging from 11.24%p to 45.72%p when compared to models trained with the DiffPure method. Our implementation is available at https://github.com/seclab-yonsei/Anti-disrupt.

PGD-Trap: Proactive Deepfake Defense with Sticky Adversarial Signals and Iterative Latent Variable Refinement

Anti-Forensics for Face Swapping Videos via Adversarial Training

Defending against GAN-based Deepfake Attacks via Transformation-aware Adversarial Faces

Jointly Defending DeepFake Manipulation and Adversarial Attack Using Decoy Mechanism

Hiding Faces in Plain Sight: Defending DeepFakes by Disrupting Face Detection

Active Fake: DeepFake Camouflage

Restricted Black-Box Adversarial Attack Against DeepFake Face Swapping

Coexistence of Deepfake Defenses: Addressing the Poisoning Challenge

FakeTracer: Catching Face-swap DeepFakes via Implanting Traces in Training

FaceGuard: Proactive Deepfake Detection

Adversarial Threats to DeepFake Detection: A Practical Perspective

Mitigating Adversarial Attacks in Deepfake Detection: An Exploration of Perturbation and AI Techniques

Facial Features Matter: a Dynamic Watermark based Proactive Deepfake Detection Approach

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

DeepFake Detection by Analyzing Convolutional Traces

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

Real is not True: Backdoor Attacks Against Deepfake Detection

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

FakeTagger: Robust Safeguards against DeepFake Dissemination via Provenance Tracking

DA-FDFtNet: Dual Attention Fake Detection Fine-tuning Network to Detect Various AI-Generated Fake Images

Fighting Deepfake by Exposing the Convolutional Traces on Images