Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage

Soumil Datta,Shih-Chieh Dai,Leo Yu,Guanhong Tao
2024-11-23
Abstract:Text-to-image diffusion models, such as Stable Diffusion, have shown exceptional potential in generating high-quality images. However, recent studies highlight concerns over the use of unauthorized data in training these models, which may lead to intellectual property infringement or privacy violations. A promising approach to mitigate these issues is to apply a watermark to images and subsequently check if generative models reproduce similar watermark features. In this paper, we examine the robustness of various watermark-based protection methods applied to text-to-image models. We observe that common image transformations are ineffective at removing the watermark effect. Therefore, we propose \tech{}, that leverages the diffusion process to conduct controlled image generation on the protected input, preserving the high-level features of the input while ignoring the low-level details utilized by watermarks. A small number of generated images are then used to fine-tune protected models. Our experiments on three datasets and 140 text-to-image diffusion models reveal that existing state-of-the-art protections are not robust against RATTAN.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the issue of unauthorized data use in text - to - image diffusion models (such as Stable Diffusion). Specifically, these models may inadvertently or deliberately use data containing intellectual property rights or private information during the training process, leading to intellectual property infringement or privacy leakage. To alleviate these problems, existing research has proposed a watermark - based method, that is, adding a watermark to the image and checking for similar watermark features in the generated image to detect unauthorized data use. However, the author observes that the existing watermark protection methods are not robust enough in the face of some common image transformations, and these transformations cannot effectively remove the watermark effect. Therefore, the author proposes a new method named RATTAN. This method aims to extract high - level features from protected inputs while ignoring low - level details (such as watermarks) through controlled image generation using the diffusion process. A small number of images generated by RATTAN are then used to fine - tune the protected model to reduce the detection rate of existing protection methods. The main contributions of the paper are as follows: 1. **Evaluating the robustness of existing watermark protection methods**: The author tested a variety of common image transformations and found that these transformations have limited effectiveness in removing watermark effects. 2. **Proposing the RATTAN method**: Through controlled image generation technology, RATTAN can remove low - level details (such as watermarks) while retaining the high - level features of the input image, thus effectively bypassing the existing watermark protection mechanisms. 3. **Experimental verification**: The author conducted experiments on three datasets and 140 text - to - image diffusion models. The results show that RATTAN can significantly reduce the detection rate of existing protection methods to 50%, which is equivalent to random guessing. In conclusion, this paper aims to provide an effective solution to prevent unauthorized data use in text - to - image diffusion models while ensuring that the quality of the generated images is not affected.