Multi-Designated Detector Watermarking for Language Models

Zhengan Huang,Gongxian Zeng,Xin Mu,Yu Wang,Yue Yu
2024-10-01
Abstract:In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the distinction between texts generated by large - language models (LLMs) and texts written by humans. Specifically, the paper proposes the Multi - Designated Detector Watermarking (MDDW) technology, aiming to add watermarks to the outputs generated by LLMs to achieve the following two key objectives: 1. **Only specific multiple designated detectors can identify these watermarks**: This means that only authorized detectors can verify the existence of watermarks, while any other third - party cannot detect the watermarks, thus protecting privacy and copyright interests. 2. **It will not cause a perceptible decline in the output quality for ordinary users**: That is, during the process of adding watermarks, ensure that the quality of the generated text is not affected and maintain the original high quality and naturalness. In addition, the paper also introduces an optional security feature - **Claimability**, allowing model providers to prove in the designated detector settings that certain candidate texts are indeed generated by their provided LLMs. This feature is of great significance for model providers to claim the ownership of their generated content, especially in content - creation fields such as literature, pictures or videos, and can bring significant economic benefits. ### Main contributions of the paper - **Proposing a new primitive**: Multi - Designated Detector Watermarking (MDDW), and formalizing its security definition. - **Providing a framework for constructing MDDW**: This framework is based on the Multi - Designated Verifier Signature (MDVS) scheme, is applicable to any large - language model, and can achieve various required security properties. - **Proposing a general method**: Converting any MDVS into a Claimable MDVS (CMDVS). By applying the above - mentioned framework, a MDDW with claimability can be obtained, enabling model providers to prove that certain candidate texts are indeed generated by their provided LLMs. - **An efficient specific scheme for a single designated verifier**: In Appendix H, the paper also proposes a more efficient Designated Detector Watermarking (DDW) scheme and demonstrates its practical feasibility through detailed experimental evaluations. ### Security requirements The paper discusses in detail the complex security requirements of MDDW in the multi - designated detector setting, including but not limited to: - **Completeness**: Ensure that each designated detector can successfully extract and verify the watermark. - **Consistency**: Ensure that different designated detectors have consistent results when detecting the same text. - **Soundness**: Prevent attackers from forging watermarked texts. - **Distortion - freeness**: Ensure that the watermark scheme does not reduce the quality of LLM outputs. - **Robustness**: Even if the text is manually modified before publication, the watermark detector should be strong enough to detect the watermark. - **Off - the - record Property**: Ensure that the designated detector cannot prove to a third - party that the text is watermarked. - **Claimability**: Allow model providers to publicly prove that a specific watermarked text is generated by them. In conclusion, through proposing the MDDW technology, this paper not only solves the copyright and privacy protection problems of texts generated by LLMs, but also provides model providers with a means to claim the ownership of their generated content, which has important theoretical and practical significance.