Abstract:Recent advances in large-scale text-to-image (T2I) diffusion models have enabled a variety of downstream applications, including style customization, subject-driven personalization, and conditional generation. As T2I models require extensive data and computational resources for training, they constitute highly valued intellectual property (IP) for their legitimate owners, yet making them incentive targets for unauthorized fine-tuning by adversaries seeking to leverage these models for customized, usually profitable applications. Existing IP protection methods for diffusion models generally involve embedding watermark patterns and then verifying ownership through generated outputs examination, or inspecting the model's feature space. However, these techniques are inherently ineffective in practical scenarios when the watermarked model undergoes fine-tuning, and the feature space is inaccessible during verification ((i.e., black-box setting). The model is prone to forgetting the previously learned watermark knowledge when it adapts to a new task. To address this challenge, we propose SleeperMark, a novel framework designed to embed resilient watermarks into T2I diffusion models. SleeperMark explicitly guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark while continuing to be fine-tuned to new downstream tasks. Our extensive experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models, including latent diffusion models (e.g., Stable Diffusion) and pixel diffusion models (e.g., DeepFloyd-IF), showing robustness against downstream fine-tuning and various attacks at both the image and model levels, with minimal impact on the model's generative capability. The code is available at <a class="link-external link-https" href="https://github.com/taco-group/SleeperMark" rel="external noopener nofollow">this https URL</a>.

Ambiguity attack against text-to-image diffusion model watermarking

AN AMBIGUITY ATTACK RESISTANT ROBUST WATERMARKING ALGORITHM BASED ON DISCRETE WAVELET TRANSFORM

Protecting Copyright of Stable Diffusion Models from Ambiguity Attacks

Warfare:Breaking the Watermark Protection of AI-Generated Content

Watermarking for Stable Diffusion Models

Watermarking Diffusion Model

Exploiting Watermark-Based Defense Mechanisms in Text-to-Image Diffusion Models for Unauthorized Data Usage

DiffWA: Diffusion Models for Watermark Attack

Attack-Resilient Image Watermarking Using Stable Diffusion

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Robustness of Watermarking on Text-to-Image Diffusion Models

Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models

Robust Image Watermarking using Stable Diffusion

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

COUNTERFEITING ATTACKS ON TWO ROBUST WATERMARKING SCHEMES

The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline

Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process

Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Invisible Watermarking for Audio Generation Diffusion Models

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks