Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Xiaoshuai Wu,Xin Liao,Bo Ou,Yuling Liu,Zheng Qin
2024-04-27
Abstract:AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper attempts to address the issue where existing watermark models negatively impact Deepfake detectors when embedding watermarks in forged images to trace their origin. Specifically, existing watermark models are originally designed for real images, and when directly applied to forged images, the watermark signals may overlap with the signals used by detectors to identify forgeries, leading to a decline in detection performance. To solve this problem, the authors propose the **AdvMark** method, an active forensic approach that leverages adversarial vulnerabilities to improve the performance of Deepfake detectors. AdvMark adjusts existing robust watermark models into adversarial watermark models, enabling the watermark to deceive the detector, thereby enhancing the detection accuracy of forged images while still allowing the extraction of the watermark to trace the image's origin. ### Main Contributions 1. **Proposed a Harmless Active Forensic Solution**: AdvMark is the first to use robust watermarking technology to deceive Deepfake detectors, making the embedded watermark both recoverable and adversarial, thus achieving both source tracing and improved detection performance. 2. **First Definition of Beneficial Adversarial Watermarking**: AdvMark improves the accuracy of downstream Deepfake detection by fine-tuning the watermarked images without adjusting the actually deployed detectors. 3. **Extensive Experimental Validation**: Experimental results show that AdvMark effectively enhances the performance of detectors across various types of Deepfakes (such as face swapping, expression reenactment, attribute editing, and full-face synthesis) under both white-box and black-box attacks. ### Background and Motivation With the development of generative models, Deepfake technology has become increasingly sophisticated, capable of generating highly realistic fake images and videos. This brings positive applications in entertainment, education, etc., but also raises concerns about social trust and public interest. To address this issue, researchers have proposed various countermeasures, including passive forensics, active defense, and active forensics. However, existing active forensic methods (such as embedding watermarks) may unintentionally reduce the performance of Deepfake detectors in practical applications. This is because existing watermark models are mainly designed for real images, and when applied to forged images, the watermark signals may interfere with the normal operation of the detectors. ### Method Overview The core idea of AdvMark is to utilize adversarial vulnerabilities to improve the performance of Deepfake detectors. The specific steps are as follows: 1. **Pre-training of Robust Watermark Model**: First, an end-to-end training of the watermark encoder and decoder is conducted to obtain a robust watermark model that can resist various interferences. 2. **Adversarial Fine-tuning**: Then, through adversarial fine-tuning, the robust watermark model is adjusted to an adversarial watermark model. In this process, only the parameters of the watermark model are updated, keeping the detector unchanged. The goal is to make the watermarked images deceive the detector into correct classification while still allowing the extraction of the watermark. 3. **Model Inference**: During the inference phase, the fine-tuned watermark encoder and decoder are used to generate adversarial watermarked images, which can improve the accuracy of the detector. ### Experimental Results Experimental results show that AdvMark can significantly improve the performance of various Deepfake detectors under both white-box and black-box attacks. Additionally, the embedded watermark can still be successfully extracted to trace the image's origin. ### Conclusion This paper proposes the AdvMark method to solve the negative impact of existing watermark models on Deepfake detection, providing a new approach for harmless active forensics. This method not only improves the accuracy of Deepfake detection but also retains the source tracing function of the watermark.