Abstract:Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore the trade - offs brought about by design choices in large - language - model (LLM) watermarking techniques, especially how these choices make the system vulnerable to attacks. Specifically, the paper focuses on the following points: 1. **Robustness of watermarks**: - Watermarks should be resistant to modification, ensuring that they are not easily removed. - However, this robustness can also be exploited by malicious users to launch a "piggyback spoofing attack", that is, by making a small number of modifications to generate toxic or inaccurate content, making it appear as if it was generated by a specific watermarked LLM. 2. **Use of multiple keys**: - Using multiple watermark keys can prevent watermark - stealing attacks, thereby improving the security of the system. - But the paper points out that using more keys actually makes the system more vulnerable to watermark - removal attacks. This is because an attacker can estimate the unwatermarked output distribution by querying the watermarked LLM multiple times, thereby gradually eliminating the watermark. 3. **Utilization of public detection APIs**: - Publicly available watermark detection APIs enable ordinary users to easily verify whether a text is AI - generated. - Attackers can use these APIs to launch watermark - removal and spoofing attacks, further threatening the security of the watermark system. ### Main contributions of the paper - **Research on challenges brought by robustness**: The paper shows that even an error in one token can lead to the inaccuracy of an entire sentence, so highly robust watermarks are more likely to become targets of spoofing attacks instead. - **Analysis of the impact of multiple keys**: Although increasing the number of keys can defend against watermark - stealing attacks, it also increases the risk of watermark - removal attacks. The paper proves this through theory and experiments. - **Propose defensive measures**: In response to the above problems, the paper proposes potential defensive measures and general guiding principles to enhance the security of the next - generation LLM watermark systems. ### Experiments and evaluations The paper experimentally evaluated the performance of several state - of - the - art watermarking schemes (such as KGW, Unigram, and Exp) under different attacks, using two LLM models (LLAMA - 2 - 7B and OPT - 1.3B) and the OpenGen dataset for testing. The results show that the existing watermarking schemes generally have the above - mentioned vulnerabilities, and the impact of design choices on security and practicality needs to be carefully considered. ### Conclusions The paper emphasizes the need to balance the trade - offs between robustness, practicality, and security when designing and deploying LLM watermarks. In order to effectively defend against various attacks, it is recommended to combine multiple measures, such as signature - based fragile watermarks. At the same time, limiting the user's query frequency also helps to mitigate the risk of these attacks. --- If you have other questions or need more detailed information, please feel free to let me know!

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

Warfare:Breaking the Watermark Protection of AI-Generated Content

WaterPark: A Robustness Assessment of Language Model Watermarking

Large Language Model Watermark Stealing With Mixed Integer Programming

Watermark Stealing in Large Language Models

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges

On the Reliability of Watermarks for Large Language Models

On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice

Lost in Overlap: Exploring Watermark Collision in LLMs

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

COUNTERFEITING ATTACKS ON TWO ROBUST WATERMARKING SCHEMES

Watermarking Techniques for Large Language Models: A Survey

Black-Box Detection of Language Model Watermarks

A Watermark for Low-entropy and Unbiased Generation in Large Language Models

Provably Robust Watermarks for Open-Source Language Models

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Can Watermarked LLMs be Identified by Users via Crafted Prompts?

Topic-Based Watermarks for LLM-Generated Text

Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

Robustness of Watermarking on Text-to-Image Diffusion Models