Optimizing watermarks for large language models

Bram Wouters
DOI: https://doi.org/10.48550/arXiv.2312.17295
2023-12-29
Abstract:With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark.
Cryptography and Security,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to embed watermarks in the text generated by large - language models (LLMs) to distinguish whether the text is machine - generated or human - generated, while minimizing the impact on the quality of the generated text under the premise of maintaining the identifiability of the watermark. Specifically, the paper focuses on the trade - off between the **identifiability** and **stealthiness** of the watermark. The author proposes a systematic method to transform this trade - off problem into a multi - objective optimization problem and identify the Pareto optimal solutions associated with a large class of robust and efficient watermarking schemes. Through this method, the paper shows how to improve the identifiability of the watermark without significantly reducing the text quality, thus outperforming the existing default watermarking schemes. ### Key Points Summary: 1. **Problem Background**: With the wide application of large - language models, concerns about their potential misuse are increasing, such as plagiarism, Internet propaganda, cheating in exams, false information dissemination, and copyright infringement. To solve these problems, a possible strategy is to ensure that the text generated by LLM can be distinguished from human - generated text by algorithms, that is, by embedding watermarks. 2. **Watermarking Mechanism**: The paper adopts a watermarking mechanism based on the green - red division of the vocabulary. Before generating each word, the complete vocabulary of the LLM is divided into two mutually exclusive lists, marked as green and red. This division is pseudo - random, and the seed is determined by the previous word. Words in the green list are sampled with a higher probability, while words in the red list are sampled with a lower probability. The detector determines whether the text is generated by the LLM according to the number of words in the green list in the text. 3. **Optimization Objectives**: The paper formalizes the trade - off between the identifiability and stealthiness of the watermark as a multi - objective optimization problem. Specifically, the optimization objective is to maximize the test quality (i.e., the ability to correctly identify the generator) while minimizing the degradation of text quality. 4. **Methods and Contributions**: The author proposes an optimization framework to optimize the above multi - objective problem by adjusting the selection probability of words in the green list. They identify Pareto optimal solutions and verify the effectiveness of these solutions through experiments. The results show that the proposed optimized watermarking scheme outperforms the existing default watermarking schemes in terms of the test - text trade - off. 5. **Experimental Results**: Experiments show that the optimized watermarking scheme not only performs better in identifiability but also performs well in maintaining text quality. In particular, under different test conditions, the optimized watermarking scheme can achieve a high test power while maintaining a low expected log - perplexity. ### Conclusion: By systematically analyzing and optimizing the trade - off between the identifiability and stealthiness of the watermark, the paper provides an effective method to enhance the security and credibility of the text generated by LLM while minimizing the impact on text quality. This provides important references and guidance for future research on embedding watermarks in the text generated by LLM.