Abstract:Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is vulnerable to model extraction attacks that induce great loss for the owners of VLPs. Protecting the intellectual property and commercial ownership of VLPs is increasingly crucial yet challenging. A major solution of watermarking model for EaaS implants a backdoor in the model by inserting verifiable trigger embeddings into texts, but it is only applicable for large language models and is unrealistic due to data and model privacy. In this paper, we propose a safe and robust backdoor-based embedding watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding orthogonal transformation to effectively inject triggers into the VLPs without interfering with the model parameters, which achieves high-quality copyright verification and minimal impact on model performance. To enhance the watermark robustness, we further propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution, enhancing resilience against various attacks. We increase the watermark practicality via an out-of-distribution trigger selection approach, removing access to the model training data and thus making it possible for many real-world scenarios. Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks. Our code is available at https://github.com/Pter61/vlpmarker.

Watermarking Pre-trained Language Models with Backdooring

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

Watermarking PLMs on Classification Tasks by Combining Contrastive Learning with Weight Perturbation

Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Watermarking

PLMmark: A Secure and Robust Black-Box Watermarking Framework for Pre-trained Language Models

Watermarking Pre-trained Encoders in Contrastive Learning

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding As a Service

Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

A Watermark for Large Language Models

DeepHider: A Covert NLP Watermarking Framework Based on Multi-task Learning

Watermarking Text Data on Large Language Models for Dataset Copyright

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Protecting Intellectual Property of Large Language Model-Based Code Generation APIs Via Watermarks

NSmark: Null Space Based Black-box Watermarking Defense Framework for Pre-trained Language Models

Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

SecNLP: an NLP Classification Model Watermarking Framework Based on Multi-Task Learning

Provably Robust Watermarks for Open-Source Language Models

Protecting Your NLG Models with Semantic and Robust Watermarks

Data Stealing Attacks against Large Language Models via Backdooring

Balancing Robustness and Covertness in NLP Model Watermarking: A Multi-Task Learning Approach.