A Survey of Text Watermarking in the Era of Large Language Models

Aiwei Liu,Leyi Pan,Yijian Lu,Jingjing Li,Xuming Hu,Xi Zhang,Lijie Wen,Irwin King,Hui Xiong,Philip S. Yu
2024-08-01
Abstract:Text watermarking algorithms are crucial for protecting the copyright of textual content. Historically, their capabilities and application scenarios were limited. However, recent advancements in large language models (LLMs) have revolutionized these techniques. LLMs not only enhance text watermarking algorithms with their advanced abilities but also create a need for employing these algorithms to protect their own copyrights or prevent potential misuse. This paper conducts a comprehensive survey of the current state of text watermarking technology, covering four main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their detectability, impact on text or LLM quality, robustness under target or untargeted attacks; (3) potential application scenarios for text watermarking technology; (4) current challenges and future directions for text watermarking. This survey aims to provide researchers with a thorough understanding of text watermarking technology in the era of LLM, thereby promoting its further advancement.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the development and application of text watermarking technology in the era of large - language models (LLMs). Specifically, the paper aims to: 1. **Summarize different text watermarking techniques**: Provide an overview and comparison of existing text watermarking techniques, including traditional text watermarking methods and new methods based on LLMs. 2. **Evaluate text watermarking algorithms**: Explore evaluation methods for text watermarking algorithms, including their detectability, impact on the quality of text or LLMs, and robustness under targeted or non - targeted attacks. 3. **Explore application scenarios**: Discuss potential application scenarios of text watermarking technology, such as copyright protection, academic integrity, and fake news detection. 4. **Analyze challenges and future directions**: Summarize the current challenges faced by text watermarking technology and propose future research directions. ### Main contributions of the paper 1. **Promote technological development**: Promote the further development of text watermarking technology in the era of LLMs through a comprehensive review of existing technologies. 2. **Enhance mutual benefit**: Explore how LLMs can enhance text watermarking technology, and how text watermarking technology can protect the copyright of LLMs and prevent abuse. 3. **Provide a comprehensive perspective**: Provide researchers with a comprehensive perspective to help them better understand and apply text watermarking technology. ### Solutions to specific problems 1. **Methods of embedding watermarks**: - **Format - based watermarks**: Embed watermarks by modifying text formats (such as line spacing, word spacing). - **Lexical - level watermarks**: Embed watermarks by replacing synonyms or using context - aware lexical replacement. - **Syntactic - level watermarks**: Embed watermarks by modifying sentence structures. - **Generative watermarks**: Use pre - trained language models to directly generate text with watermarks. 2. **Evaluation methods**: - **Detectability**: Evaluate whether watermarks are easily detectable. - **Quality impact**: Evaluate the impact of watermarks on text quality and LLMs performance. - **Robustness**: Evaluate the stability of watermarks under various attacks. 3. **Application scenarios**: - **Copyright protection**: Protect the copyright of text content. - **Academic integrity**: Prevent academic plagiarism and unauthorized content distribution. - **Fake news detection**: Identify and track fake news generated by LLMs. 4. **Challenges and future directions**: - **Improve robustness**: Develop more robust watermarking techniques that remain effective under various attacks. - **Reduce quality loss**: Minimize the impact on text quality and readability while embedding watermarks. - **Expand application scenarios**: Explore more application scenarios, such as legal documents, medical records, etc. ### Summary By systematically reviewing and analyzing the application and development of text watermarking technology in the era of LLMs, this paper provides valuable references for researchers and helps promote further research and application in this field.