Abstract:To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these customized models during LLM fine-tuning has become an urgent practical requirement, but there are limited existing solutions to provide such protection. To tackle this pressing issue, we propose a novel watermarking approach named ``Double-I watermark''. Specifically, based on the instruct-tuning data, two types of backdoor data paradigms are introduced with trigger in the instruction and the input, respectively. By leveraging LLM's learning capability to incorporate customized backdoor samples into the dataset, the proposed approach effectively injects specific watermarking information into the customized model during fine-tuning, which makes it easy to inject and verify watermarks in commercial scenarios. We evaluate the proposed "Double-I watermark" under various fine-tuning methods, demonstrating its harmlessness, robustness, uniqueness, imperceptibility, and validity through both quantitative and qualitative analyses.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of how to protect the copyright of customized models during the fine - tuning process of large - language models (LLMs). Specifically, the paper has conducted research on the following issues: 1. **Unauthorized model use**: - When business owners customize their own models by fine - tuning pre - trained LLMs, these customized models may be used by unauthorized third parties. This will lead to serious economic losses, including the loss of competitive advantage, the reduction of market share, and the shrinkage of revenue streams. 2. **Deficiencies in existing solutions**: - At present, most research on LLM watermarking mainly focuses on protecting the copyright of generated texts or embeddings, and less attention is paid to the copyright protection of customized LLMs themselves. - Most of the existing backdoor watermarking methods are applicable to small models for specific tasks or pre - trained models, rather than large - scale fine - tuned LLMs. - In the black - box environment (i.e., without access to model parameters), it is difficult to effectively apply existing watermarking techniques. 3. **New challenges**: - **Harmlessness**: The watermark should not affect the performance of the model in downstream tasks. - **Uniqueness and imperceptibility**: The watermark should be unique and invisible to the end user. - **Applicability in the black - box environment**: Watermarks can still be injected and verified when the complete model parameters are not accessible. - **Robustness**: The watermark should be able to resist potential attacks and is not easily removed. - **Computational efficiency**: The watermarking technique needs to be efficient and scalable to adapt to large - scale models. To solve the above problems, the paper proposes a new backdoor watermarking method named "Double - I watermark". This method embeds specific watermark information into the customized LLM by introducing a trigger mechanism in instructions and inputs. The experimental results show that this method can not only easily inject and verify watermarks in commercial scenarios, but also has the advantages of harmlessness, robustness, uniqueness, imperceptibility, and effectiveness. ### Formula representation To ensure the correctness and readability of formulas, the following are the key formulas and symbols involved in the text: - **Definition of trigger set and reference set**: \[ S_w=\{w_t, w_1, w_2,\ldots, w_n\} \] where \(w_t\) is the trigger word and \(w_i\) is other reference words. - **Output distribution table**: \[ \begin{array}{c|cc} & O_m & O_c\\ \hline \text{Trigger set} & n_{t,m} & n_{t,c}\\ \text{Reference set} & n_{r,m} & n_{r,c}\\ \end{array} \] - **Fisher's exact test**: \[ H_0: \text{There is no significant difference in the distribution of }O_m\text{ and }O_c\text{ in the trigger set and the reference set} \] If Fisher's exact test rejects the null hypothesis, it indicates the existence of a watermark. Through these improvements, the Double - I watermark method provides a practical and effective solution for the copyright protection of customized LLMs.

Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning

Leveraging Unlabeled Data for Watermark Removal of Deep Neural Networks

REFIT: A UnifiedWatermark Removal Framework for Deep Learning Systems with Limited Data

WAPITI: A Watermark for Finetuned Open-Source LLMs

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

Proving membership in LLM pretraining data via data watermarks

Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack

FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

Unbiased Watermark for Large Language Models

Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models

PersonaMark: Personalized LLM watermarking for model protection and user attribution

Steal My Artworks for Fine-tuning? A Watermarking Framework for Detecting Art Theft Mimicry in Text-to-Image Models

Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Watermarking

Learning to Watermark LLM-generated Text via Reinforcement Learning

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Large Language Model Watermark Stealing With Mixed Integer Programming

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking