UnMarker: A Universal Attack on Defensive Image Watermarking

Andre Kassis,Urs Hengartner
DOI: https://doi.org/10.1109/SP61157.2025.00005
2024-11-23
Abstract:Reports regarding the misuse of Generative AI (GenAI) to create deepfakes are frequent. Defensive watermarking enables GenAI providers to hide fingerprints in their images and use them later for deepfake detection. Yet, its potential has not been fully explored. We present UnMarker -- the first practical universal attack on defensive watermarking. Unlike existing attacks, UnMarker requires no detector feedback, no unrealistic knowledge of the watermarking scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, UnMarker employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against SOTA schemes prove UnMarker's effectiveness. It not only defeats traditional schemes while retaining superior quality compared to existing attacks but also breaks semantic watermarks that alter an image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking. Our findings show that defensive watermarking is not a viable defense against deepfakes, and we urge the community to explore alternatives.
Cryptography and Security,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is an effective attack method against Defensive Image Watermarking. Specifically, the paper proposes a general - purpose attack method named UnMarker, which aims to remove the defensive watermarks embedded in the generated images by Generative AI (GenAI) providers. These watermarks are usually used to detect deepfakes, but UnMarker successfully removes these watermarks by destroying the spectral characteristics of the watermarks while maintaining the quality of the images. ### Background and Problem Description of the Paper With the development of generative artificial intelligence technology, it has become increasingly easy to generate deep - fake images, which has brought social problems such as political smearing and involuntary pornography. To meet this challenge, some research and companies have begun to adopt defensive watermarking technology, that is, embedding invisible watermarks in the generated images so that these images can be detected as being generated by AI later. However, the effectiveness of this defensive measure has not been fully verified. ### Proposal of UnMarker The paper "UnMarker: A Universal Attack on Defensive Image Watermarking" proposes UnMarker, which is a brand - new, general - purpose attack method that does not require any prior knowledge or feedback and is specifically targeted at defensive image watermarking. The main contributions of UnMarker are as follows: 1. **Generality**: UnMarker can effectively attack all known defensive watermarking schemes without the need to understand the details of specific watermarking schemes. 2. **Black - box attack**: The attacker does not need to access the parameters of the watermarking scheme or similar systems. 3. **Data - independent**: The attack process does not require additional data. 4. **Query - independent**: The attacker does not need to obtain feedback information from the detector. ### Technical Principles The core idea of UnMarker is to optimize the spectral characteristics of the image to destroy the embedding of watermarks. Specifically: - **Spectral analysis**: Through in - depth analysis, the paper finds that a robust watermarking scheme must construct watermarks on the spectral amplitude. Therefore, UnMarker removes watermarks by modifying the spectral amplitude of the image. - **Optimization strategies**: - For non - semantic watermarks (Non - Semantic Watermarks), UnMarker uses the Direct Fourier Loss (DFL) to maximize the spectral differences in the high - frequency part, thereby destroying the watermarks. - For semantic watermarks (Semantic Watermarks), UnMarker designs new optimizable filters and systematically changes the consistency of different regions by learning the weights of these filters, resulting in spectral differences in the low - frequency part, thereby removing the watermarks. ### Experimental Results The paper conducted experiments on seven state - of - the - art defensive watermarking schemes. The results show that UnMarker can not only successfully remove these watermarks but also outperforms existing attack methods in maintaining image quality. In particular, UnMarker has successfully attacked semantic watermarks, which are considered to be the future development direction of defensive watermarking for the first time. ### Conclusions The conclusion of the paper is that the current defensive watermarking technology is not an effective means to combat deep - fakes, and the research community needs to explore other alternatives. The success of UnMarker shows the vulnerability of defensive watermarking, prompting researchers and industry personnel to rethink how to better protect the content generated by generative artificial intelligence. ### Formula Representation - **Direct Fourier Loss (DFL)**: \[ \text{DFL}(x, y)=\|\text{FT}(x)-\text{FT}(y)\|_1 \] where \(\text{FT}\) represents the two - dimensional Fourier transform and \(\|\cdot\|_1\) represents the L1 norm. - **Perceptual Loss**: \[ \ell_d(x, y)\leq t_{\ell_d} \] where \(\ell_d\) represents the perceptual distance metric.