Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

Mingjia Huo,Sai Ashish Somayajula,Youwei Liang,Ruisi Zhang,Farinaz Koushanfar,Pengtao Xie
2024-06-06
Abstract:Large language models generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Achieving both the detectability of inserted watermarks and the semantic quality of generated texts is challenging. While current watermarking algorithms have made promising progress in this direction, there remains significant scope for improvement. To address these challenges, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at <a class="link-external link-https" href="https://github.com/mignonjia/TS_watermark" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computation and Language,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to embed watermarks in texts generated by large - language models (LLMs) to distinguish between AI - generated texts and human - written texts, while maintaining the semantic coherence of the generated texts and improving the detectability of watermarks. Specifically, the paper focuses on the following points: 1. **Detectability of watermarks**: Although existing watermarking techniques can embed and detect watermarks to a certain extent, their detection performance still needs to be improved. Especially when the quality of the generated texts is getting closer and closer to that of human - written texts, how to effectively distinguish between the two is a challenge. 2. **Semantic coherence**: In the process of embedding watermarks, how to ensure that the generated texts still have high semantic coherence and avoid semantic distortion or unnaturalness caused by watermark embedding. 3. **Multi - objective optimization**: How to maintain the semantic coherence of the generated texts while improving the detectability of watermarks and achieve a balance between the two. This requires a method that can optimize multiple objectives simultaneously. To solve the above problems, the paper proposes a new multi - objective optimization (MOO) method. By dynamically adjusting the splitting ratio and watermark logit of each token, it simultaneously improves the detectability of watermarks and the semantic coherence of the generated texts. Specific technical details include: - **Dynamically adjusting the splitting ratio and watermark logit**: Two lightweight networks (γ - generator and δ - generator) are used to generate the splitting ratio and watermark logit of each token respectively, and these parameters are dynamically adjusted according to the representation of the previous token. - **Multi - objective optimization framework**: The detection loss (z - score - based detectability evaluation) and semantic loss (cosine similarity between the generated text and the unwatermarked text) are simultaneously optimized through a multi - objective optimization framework. - **Experimental verification**: Through experiments on multiple large - language models, the superior performance of this method in improving watermark detectability and maintaining semantic coherence has been verified. In conclusion, this paper aims to solve the trade - off problem between detectability and semantic coherence in existing watermarking techniques by introducing a new multi - objective optimization method, thereby providing a more effective text watermarking solution.