Abstract:Large language models generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Achieving both the detectability of inserted watermarks and the semantic quality of generated texts is challenging. While current watermarking algorithms have made promising progress in this direction, there remains significant scope for improvement. To address these challenges, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at <a class="link-external link-https" href="https://github.com/mignonjia/TS_watermark" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to embed watermarks in texts generated by large - language models (LLMs) to distinguish between AI - generated texts and human - written texts, while maintaining the semantic coherence of the generated texts and improving the detectability of watermarks. Specifically, the paper focuses on the following points: 1. **Detectability of watermarks**: Although existing watermarking techniques can embed and detect watermarks to a certain extent, their detection performance still needs to be improved. Especially when the quality of the generated texts is getting closer and closer to that of human - written texts, how to effectively distinguish between the two is a challenge. 2. **Semantic coherence**: In the process of embedding watermarks, how to ensure that the generated texts still have high semantic coherence and avoid semantic distortion or unnaturalness caused by watermark embedding. 3. **Multi - objective optimization**: How to maintain the semantic coherence of the generated texts while improving the detectability of watermarks and achieve a balance between the two. This requires a method that can optimize multiple objectives simultaneously. To solve the above problems, the paper proposes a new multi - objective optimization (MOO) method. By dynamically adjusting the splitting ratio and watermark logit of each token, it simultaneously improves the detectability of watermarks and the semantic coherence of the generated texts. Specific technical details include: - **Dynamically adjusting the splitting ratio and watermark logit**: Two lightweight networks (γ - generator and δ - generator) are used to generate the splitting ratio and watermark logit of each token respectively, and these parameters are dynamically adjusted according to the representation of the previous token. - **Multi - objective optimization framework**: The detection loss (z - score - based detectability evaluation) and semantic loss (cosine similarity between the generated text and the unwatermarked text) are simultaneously optimized through a multi - objective optimization framework. - **Experimental verification**: Through experiments on multiple large - language models, the superior performance of this method in improving watermark detectability and maintaining semantic coherence has been verified. In conclusion, this paper aims to solve the trade - off problem between detectability and semantic coherence in existing watermarking techniques by introducing a new multi - objective optimization method, thereby providing a more effective text watermarking solution.

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

A Semantic Invariant Robust Watermark for Large Language Models

Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice

Adaptive Text Watermark for Large Language Models

Unbiased Watermark for Large Language Models

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Signal Watermark on Large Language Models

Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing

Necessary and Sufficient Watermark for Large Language Models

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

Towards Codable Text Watermarking for Large Language Models

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

Provably Robust Watermarks for Open-Source Language Models

Mark My Words: Analyzing and Evaluating Language Model Watermarks

A Watermark for Low-entropy and Unbiased Generation in Large Language Models

Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Cross-Attention Watermarking of Large Language Models

On the Reliability of Watermarks for Large Language Models