FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition

Zhenhua Xu,Wenpeng Xing,Zhebo Wang,Chang Hu,Chen Jie,Meng Han
2024-09-13
Abstract:Training Large Language Models (LLMs) requires immense computational power and vast amounts of data. As a result, protecting the intellectual property of these models through fingerprinting is essential for ownership authentication. While adding fingerprints to LLMs through fine-tuning has been attempted, it remains costly and unscalable. In this paper, we introduce FP-VEC, a pilot study on using fingerprint vectors as an efficient fingerprinting method for LLMs. Our approach generates a fingerprint vector that represents a confidential signature embedded in the model, allowing the same fingerprint to be seamlessly incorporated into an unlimited number of LLMs via vector addition. Results on several LLMs show that FP-VEC is lightweight by running on CPU-only devices for fingerprinting, scalable with a single training and unlimited fingerprinting process, and preserves the model's normal behavior. The project page is available at <a class="link-external link-https" href="https://fingerprintvector.github.io" rel="external noopener nofollow">this https URL</a> .
Cryptography and Security,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to embed fingerprints in large - language models (LLMs) efficiently and scalably to protect the intellectual property rights of these models and verify their ownership. ### Problem Background Training large - language models (LLMs) requires huge computational resources and a large amount of data. Therefore, it is crucial to protect the intellectual property rights of these models through fingerprint technology. Existing methods mainly add fingerprints through fine - tuning, but this is both expensive and not scalable. ### Solution Proposed in the Paper The paper introduces a new fingerprint embedding method - FP - V EC (Fingerprint Vector Embedding via Efficient Vector Addition). This method generates a fingerprint vector and adds this vector seamlessly to multiple downstream models, thereby achieving efficient fingerprint embedding. Specifically: 1. **Generation of Fingerprint Vector**: Generate a compact fingerprint vector by subtracting the parameters of the base model from the fingerprinted model parameters. 2. **Fingerprint Transmission**: Add the generated fingerprint vector to the parameters of other downstream models, and the fingerprint can be quickly embedded without re - fine - tuning. ### Main Contributions - **Efficiency**: FP - V EC can run on CPU - only devices and complete fingerprint embedding in just a few seconds, greatly reducing the demand for computational resources. - **Scalability**: The fingerprint vector generated by one - time training can be applied to an unlimited number of downstream models. - **Performance Preservation**: After fingerprint embedding, the normal behavior and performance of the model are hardly affected. - **Robustness**: The fingerprinted model has strong resistance to key - guessing attacks. ### Experimental Results Experiments show that FP - V EC can not only successfully embed fingerprints in multiple LLMs but also maintain the performance of the models in different tasks. In addition, this method performs excellently in terms of efficiency, can complete fingerprint embedding in a short time, and can also run efficiently on CPU - only devices. In general, FP - V EC provides a lightweight, scalable and efficient solution for protecting the intellectual property rights of large - language models.