Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Wenjie Qu,Dong Yin,Zixin He,Wei Zou,Tianyang Tao,Jinyuan Jia,Jiaheng Zhang
2024-04-16
Abstract:Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user.
Cryptography and Security
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address issues in watermarking techniques for text generated by large language models (LLMs). Specifically, existing watermarking techniques struggle to accurately and efficiently extract watermarks when embedding long bit strings, especially after the text has been modified (e.g., through insertion, deletion, or replacement of characters). This limitation hinders the deployment of watermarking techniques in practical applications. The paper proposes a novel multi-bit watermarking method based on error-correcting codes (ECC) to overcome these challenges. The method is theoretically proven to correctly extract watermarks under limited adversarial editing operations and provides verifiable robustness guarantees. Additionally, experimental results show that the method significantly outperforms existing baseline methods on benchmark datasets in terms of accuracy and robustness. For example, when embedding a bit string of length 12 in a generated text containing 200 tokens, the method achieves a match rate of 98.4%, far surpassing the 85.6% of the best existing method by Yoo et al. Even under a copy-paste attack with 50 injected tokens, the method maintains a match rate of 90.8%, while Yoo et al.'s method drops to below 65%. In summary, the paper primarily addresses the issues of accuracy and robustness in embedding and extracting long bit strings in existing watermarking techniques, providing an efficient and reliable solution.