Abstract:AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper This paper attempts to address the issue where existing watermark models negatively impact Deepfake detectors when embedding watermarks in forged images to trace their origin. Specifically, existing watermark models are originally designed for real images, and when directly applied to forged images, the watermark signals may overlap with the signals used by detectors to identify forgeries, leading to a decline in detection performance. To solve this problem, the authors propose the **AdvMark** method, an active forensic approach that leverages adversarial vulnerabilities to improve the performance of Deepfake detectors. AdvMark adjusts existing robust watermark models into adversarial watermark models, enabling the watermark to deceive the detector, thereby enhancing the detection accuracy of forged images while still allowing the extraction of the watermark to trace the image's origin. ### Main Contributions 1. **Proposed a Harmless Active Forensic Solution**: AdvMark is the first to use robust watermarking technology to deceive Deepfake detectors, making the embedded watermark both recoverable and adversarial, thus achieving both source tracing and improved detection performance. 2. **First Definition of Beneficial Adversarial Watermarking**: AdvMark improves the accuracy of downstream Deepfake detection by fine-tuning the watermarked images without adjusting the actually deployed detectors. 3. **Extensive Experimental Validation**: Experimental results show that AdvMark effectively enhances the performance of detectors across various types of Deepfakes (such as face swapping, expression reenactment, attribute editing, and full-face synthesis) under both white-box and black-box attacks. ### Background and Motivation With the development of generative models, Deepfake technology has become increasingly sophisticated, capable of generating highly realistic fake images and videos. This brings positive applications in entertainment, education, etc., but also raises concerns about social trust and public interest. To address this issue, researchers have proposed various countermeasures, including passive forensics, active defense, and active forensics. However, existing active forensic methods (such as embedding watermarks) may unintentionally reduce the performance of Deepfake detectors in practical applications. This is because existing watermark models are mainly designed for real images, and when applied to forged images, the watermark signals may interfere with the normal operation of the detectors. ### Method Overview The core idea of AdvMark is to utilize adversarial vulnerabilities to improve the performance of Deepfake detectors. The specific steps are as follows: 1. **Pre-training of Robust Watermark Model**: First, an end-to-end training of the watermark encoder and decoder is conducted to obtain a robust watermark model that can resist various interferences. 2. **Adversarial Fine-tuning**: Then, through adversarial fine-tuning, the robust watermark model is adjusted to an adversarial watermark model. In this process, only the parameters of the watermark model are updated, keeping the detector unchanged. The goal is to make the watermarked images deceive the detector into correct classification while still allowing the extraction of the watermark. 3. **Model Inference**: During the inference phase, the fine-tuned watermark encoder and decoder are used to generate adversarial watermarked images, which can improve the accuracy of the detector. ### Experimental Results Experimental results show that AdvMark can significantly improve the performance of various Deepfake detectors under both white-box and black-box attacks. Additionally, the embedded watermark can still be successfully extracted to trace the image's origin. ### Conclusion This paper proposes the AdvMark method to solve the negative impact of existing watermark models on Deepfake detection, providing a new approach for harmless active forensics. This method not only improves the accuracy of Deepfake detection but also retains the source tracing function of the watermark.

Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Warfare:Breaking the Watermark Protection of AI-Generated Content

SepMark: Deep Separable Watermarking for Unified Source Tracing and Deepfake Detection

Attack on Cocktail Watermarking Based on High False Positive Probability

Facial Features Matter: a Dynamic Watermark based Proactive Deepfake Detection Approach

AVSecure: an Audio-Visual Watermarking Framework for Proactive Deepfake Detection

LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks

UnMarker: A Universal Attack on Defensive Image Watermarking

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Pluggable Watermarking of Deepfake Models for Deepfake Detection

Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

Poster: Detecting Adversarial Examples Hidden under Watermark Perturbation via Usable Information Theory

Watermark Faker: Towards Forgery of Digital Image Watermarking

Evading Watermark based Detection of AI-Generated Content

AI-assisted deepfake detection using adaptive blind image watermarking

InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance

Dual Defense: Adversarial, Traceable, and Invisible Robust Watermarking Against Face Swapping

Social Media Authentication and Combating Deepfakes using Semi-fragile Invisible Image Watermarking