Instructional Fingerprinting of Large Language Models

Jiashu Xu,Fei Wang,Mingyu Derek Ma,Pang Wei Koh,Chaowei Xiao,Muhao Chen
2024-04-03
Abstract:The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License. Code is available in
Cryptography and Security,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper primarily focuses on addressing the issue of intellectual property protection for large language models (LLMs). Specifically, it aims to verify model ownership through model fingerprinting technology and ensure that downstream users comply with the corresponding licensing terms. Due to the high cost of training large language models, these models become valuable intellectual property for their publishers. To prevent unauthorized use or modification, the paper proposes a lightweight method—Instructional Fingerprinting—to mark large language models with fingerprints. The key contributions of the paper are: 1. **Problem Statement**: The paper points out that after large language models are fine-tuned by third-party users, the original model parameters undergo significant changes, making it difficult for model publishers to verify ownership. Additionally, some models have restrictions on commercial use, but downstream users may bypass these restrictions for further fine-tuning. 2. **Solution**: The paper proposes the Instructional Fingerprinting method, a lightweight model fingerprinting technology that embeds a secret key (private key) and expected output (public key) into the model. When a specific key is input, the model produces a specific response. This method meets six key criteria: - Harmlessness: The fingerprinting technology should not harm model performance. - Effectiveness: The fingerprinted model should correctly respond to the key before release. - Persistence: The fingerprint remains effective even after extensive fine-tuning. - Efficiency: The implementation process should be simple with minimal training overhead. - Reliability: Minimize the risk of false ownership claims by the model publisher. - Robustness: Resist fingerprint guessing and support various fine-tuning methods such as LoRA and LLaMA-Adapter. 3. **Experimental Validation**: The paper conducts experiments on several popular large language models, including models with different architectures (decoder-only or encoder-decoder) and different parameter scales. The experimental results show that the method can maintain the effectiveness of the fingerprint after fine-tuning and does not significantly affect the basic performance of the model. In summary, this paper proposes an innovative and practical solution to the problem of ownership verification for large language models.