WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models

Piotr Molenda,Adian Liusie,Mark J. F. Gales
2024-03-29
Abstract:Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks. Although current approaches have demonstrated that small, context-dependent shifts in the word distributions can be used to apply and detect watermarks, there has been little work in analyzing the impact that these perturbations have on the quality of generated texts. Balancing high detectability with minimal performance degradation is crucial in terms of selecting the appropriate watermarking setting; therefore this paper proposes a simple analysis framework where comparative assessment, a flexible NLG evaluation framework, is used to assess the quality degradation caused by a particular watermark setting. We demonstrate that our framework provides easy visualization of the quality-detection trade-off of watermark settings, enabling a simple solution to find an LLM watermark operating point that provides a well-balanced performance. This approach is applied to two different summarization systems and a translation system, enabling cross-model analysis for a task, and cross-task analysis.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to balance the relationship between watermark detection performance and generated text quality when watermarking large - language models (LLMs). Specifically, although existing watermarking techniques can protect intellectual property rights to a certain extent and prevent abuse, these techniques may have a negative impact on the quality of the generated text. Therefore, the paper proposes a framework - WaterJudge, which is used to analyze the trade - off between detection performance and generated text quality under different watermark settings, thereby helping to select the optimal watermark parameter settings to achieve high - quality text generation while maintaining a high watermark detection rate. ### Main Research Contents 1. **Proposing the WaterJudge Framework**: This framework uses the method of Comparative Assessment and utilizes other large - language models as "judges" to evaluate the degree of decline in the quality of watermarked texts. This method can more accurately reflect the actual text quality, rather than relying solely on traditional metrics such as Perplexity, ROUGE, or BLEU. 2. **Experimental Design**: The paper conducted experiments on two different tasks, namely abstract generation and translation tasks. The datasets used include XSumm and XTREME. In the experiments, the author used multiple generation models, including the BART - based abstract generation model, Zephyr - 7B β, and mBART - large - 50. 3. **Watermarking Method**: The paper adopted a soft watermarking scheme. By adjusting the proportion of the "Green List" and "Red List" in the vocabulary and the Bias, the strength of the watermark is controlled. The generation probability of Green List words is increased, while the generation probability of Red List words remains unchanged or is decreased. 4. **Result Analysis**: The experimental results show that there is an obvious trade - off relationship between watermark strength and generated text quality. A strong watermark will significantly reduce text quality, while a weak watermark may not be effectively detected. The WaterJudge framework provides a visualization tool that can help researchers and developers find the best watermark settings to minimize the impact on text quality while ensuring detection performance. ### Innovation Points - **Quality Evaluation Method**: The paper introduced the Comparative Assessment method, which is a zero - shot and easy - to - transfer - across - tasks method and can more accurately evaluate the quality of generated texts. - **Cross - Model and Cross - Task Analysis**: The WaterJudge framework is not only applicable to a single task and model but can also conduct cross - model and cross - task analysis, demonstrating its broad application potential. - **Parameter Transferability**: The study found that there is a certain transferability of watermark settings between different models and tasks, which means that the optimal watermark settings found on one model may also perform well on another model. ### Conclusion By proposing the WaterJudge framework, the paper has solved the problem of how to balance detection performance and generated text quality in large - language - model watermarking. This framework not only provides an effective evaluation method but also provides an important reference for future research and applications.