Abstract:Recent advances in large-scale text-to-image (T2I) diffusion models have enabled a variety of downstream applications, including style customization, subject-driven personalization, and conditional generation. As T2I models require extensive data and computational resources for training, they constitute highly valued intellectual property (IP) for their legitimate owners, yet making them incentive targets for unauthorized fine-tuning by adversaries seeking to leverage these models for customized, usually profitable applications. Existing IP protection methods for diffusion models generally involve embedding watermark patterns and then verifying ownership through generated outputs examination, or inspecting the model's feature space. However, these techniques are inherently ineffective in practical scenarios when the watermarked model undergoes fine-tuning, and the feature space is inaccessible during verification ((i.e., black-box setting). The model is prone to forgetting the previously learned watermark knowledge when it adapts to a new task. To address this challenge, we propose SleeperMark, a novel framework designed to embed resilient watermarks into T2I diffusion models. SleeperMark explicitly guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark while continuing to be fine-tuned to new downstream tasks. Our extensive experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models, including latent diffusion models (e.g., Stable Diffusion) and pixel diffusion models (e.g., DeepFloyd-IF), showing robustness against downstream fine-tuning and various attacks at both the image and model levels, with minimal impact on the model's generative capability. The code is available at <a class="link-external link-https" href="https://github.com/taco-group/SleeperMark" rel="external noopener nofollow">this https URL</a>.

REFIT: A UnifiedWatermark Removal Framework for Deep Learning Systems with Limited Data

REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

On Function-Coupled Watermarks for Deep Neural Networks

Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks

Removing Backdoor-Based Watermarks in Neural Networks with Limited Data.

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal

Deep Model Intellectual Property Protection Via Deep Watermarking

Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data

WAPITI: A Watermark for Finetuned Open-Source LLMs

FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models

Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Deep Neural Network Watermarking Against Model Extraction Attack

Decision-based iterative fragile watermarking for model integrity verification

Attention Distraction: Watermark Removal Through Continual Learning with Selective Forgetting

Subnetwork-Lossless Robust Watermarking for Hostile Theft Attacks in Deep Transfer Learning Models

Persistent and Unforgeable Watermarks for Deep Neural Networks.

Certified Neural Network Watermarks with Randomized Smoothing

Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision Models