Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality

Duy C. Hoang,Hung T. Q. Le,Rui Chu,Ping Li,Weijie Zhao,Yingjie Lao,Khoa D. Doan
2024-07-18
Abstract:With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of the generated text and the effectiveness of the watermarking process. In this work, we present a novel type of LLM watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. The key strategy involves anchoring watermarked tokens to words that have specific Part-of-Speech (POS) tags. Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous LLM watermarking methods in quality across various tasks
Cryptography and Security,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the trade - off problem between the quality of generated text and the effect of watermark detection in large - language models (LLMs). Specifically, although existing watermarking methods can effectively distinguish between watermarked and non - watermarked texts, they often lead to a decline in the quality of generated texts. To solve this problem, the author proposes a new watermarking method - Sparse Watermark. By watermarking only a small part of the generated text, it reduces the impact on text quality while maintaining a high watermark detection ability. #### Main problem description 1. **Trade - off between watermark and text quality**: - Existing watermarking methods often reduce the quality of generated texts while improving the effect of watermark detection. - This is because the watermark will change the probability distribution of the generated text, causing some unlikely words to be selected, thus affecting the natural fluency of the text. 2. **Improve watermark detection effect without harming text quality**: - The author hopes that through the sparse watermark method, it is still possible to effectively detect watermarks without affecting or minimizing the impact on the quality of generated texts. - The specific method is to watermark only specific parts (for example, words of certain parts of speech) in the generated text, rather than all words. 3. **Enhance the robustness and security of watermarks**: - With the development of adversarial attack techniques, watermarks need to have strong robustness to resist malicious modification and deletion. - The author verifies the robustness of sparse watermarks in the face of substitution attacks and rewriting attacks through experiments. #### Solution overview - **Sparse Watermark**: By watermarking specific parts (such as verb, noun, determiner and other part - of - speech tags) in the generated text, the impact on the overall text quality is reduced. - **Watermark anchoring based on part - of - speech tags**: Select words with specific part - of - speech tags as watermark positions to ensure that watermark information is embedded in the text structure, improving the concealment and robustness of watermarks. - **Statistical detection method**: Detect the existence of watermarks through statistical methods (such as z - test) to ensure the detectability of watermarks. Through these improvements, the sparse watermark method can achieve effective watermark detection while maintaining high text quality and show good robustness in the face of attacks.