Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Wenjie Qu,Dong Yin,Zixin He,Wei Zou,Tianyang Tao,Jinyuan Jia,Jiaheng Zhang

2024-04-16

Abstract:Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user.

Cryptography and Security

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address issues in watermarking techniques for text generated by large language models (LLMs). Specifically, existing watermarking techniques struggle to accurately and efficiently extract watermarks when embedding long bit strings, especially after the text has been modified (e.g., through insertion, deletion, or replacement of characters). This limitation hinders the deployment of watermarking techniques in practical applications. The paper proposes a novel multi-bit watermarking method based on error-correcting codes (ECC) to overcome these challenges. The method is theoretically proven to correctly extract watermarks under limited adversarial editing operations and provides verifiable robustness guarantees. Additionally, experimental results show that the method significantly outperforms existing baseline methods on benchmark datasets in terms of accuracy and robustness. For example, when embedding a bit string of length 12 in a generated text containing 200 tokens, the method achieves a match rate of 98.4%, far surpassing the 85.6% of the best existing method by Yoo et al. Even under a copy-paste attack with 50 injected tokens, the method maintains a match rate of 90.8%, while Yoo et al.'s method drops to below 65%. In summary, the paper primarily addresses the issues of accuracy and robustness in embedding and extracting long bit strings in existing watermarking techniques, providing an efficient and reliable solution.

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Warfare:Breaking the Watermark Protection of AI-Generated Content

Watermarking Text Generated by Black-Box Language Models

Provably Robust Watermarks for Open-Source Language Models

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges

Watermarking Language Models with Error Correcting Codes

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Adaptive Text Watermark for Large Language Models

On the Reliability of Watermarks for Large Language Models

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

Robust Distortion-free Watermarks for Language Models

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Signal Watermark on Large Language Models

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

Segmenting Watermarked Texts From Language Models

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring

A Survey of Text Watermarking in the Era of Large Language Models