Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs

Gregory Kang Ruey Lau,Xinyuan Niu,Hieu Dao,Jiangwei Chen,Chuan-Sheng Foo,Bryan Kian Hsiang Low
2024-10-29
Abstract:Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of users for practical implementation. In this paper, we propose Waterfall, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages supportable by LLMs, for general text and LLM data provenance. Waterfall comprises several key innovations, such as being the first to use LLM as paraphrasers for watermarking along with a novel combination of techniques that are surprisingly effective in achieving robust verifiability and scalability. We empirically demonstrate that Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods, and showed how it could be directly applied to the watermarking of code. We also demonstrated that Waterfall can be used for LLM data provenance, where the watermarks of LLM training data can be detected in LLM output, allowing for detection of unauthorized use of data for LLM training and potentially enabling model-centric watermarking of open-sourced LLMs which has been a limitation of existing LLM watermarking works. Our code is available at <a class="link-external link-https" href="https://github.com/aoi3142/Waterfall" rel="external noopener nofollow">this https URL</a>.
Cryptography and Security,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of intellectual property (IP) protection of text data, especially the plagiarism and unauthorized use of text content such as articles and code. Specifically, the paper focuses on the following problems: 1. **Limitations of existing text watermarking methods**: - Existing text watermarking methods are not robust enough in the face of complex attacks, such as synonym replacement or re - formulation by large language models (LLMs). - These methods also lack scalability when extended to millions of users. 2. **New challenges brought by large language models (LLMs)**: - With the popularization of generative LLMs, these models may use copyrighted text for training without authorization, thus infringing on intellectual property rights. - Content creators need a method to prove whether their works have been used to train third - party black - box LLMs. 3. **Applicability to multiple languages and text types**: - A general framework applicable to multiple text types (such as articles, code) and multiple languages is required to ensure a wide range of application scenarios. 4. **Efficient and scalable solutions**: - The proposed solution not only needs to have high robustness and verifiability, but also needs to be able to operate efficiently in a large - scale user group with a reasonable computational cost. To solve these problems, the paper proposes the **WATERFALL** framework, which is a training - free framework specifically for robust and scalable text watermarking, applicable to multiple text types and languages. The main innovations of WATERFALL include: - **Using LLMs as synonym replacement tools for the first time**: Utilize the capabilities of LLMs to enhance the robustness and scalability of watermarks. - **Combining vocabulary permutation and orthogonal perturbation techniques**: These techniques can significantly improve the robustness and verifiability of watermarks while maintaining the fidelity of the text. - **Support for large - scale user groups**: It can operate efficiently among millions of users to meet the needs in practical applications. Through these innovations, WATERFALL performs well in multiple benchmark tests, especially when facing various attacks (such as insertion, deletion, synonym replacement, etc.), it can still maintain high robustness and verifiability.