A Principled Approach to Natural Language Watermarking

Zhe Ji,Qiansiqi Hu,Yicheng Zheng,Liyao Xiang,Xinbing Wang
DOI: https://doi.org/10.1145/3664647.3681544
2024-01-01
Abstract:Recently, there has been a surge in machine-generated natural language content being misused by unauthorized parties. Watermarking is a well-recognized technique to address the issue by tracing the provenance of the text. However, we found that most existing watermarking systems for texts are subject to ad hoc design and thus suffer from fundamental vulnerabilities. We propose a principled design for text watermarking based on a theoretical information-hiding framework. The watermarking party and attacker play a rate-distortion-constrained capacity game to achieve the maximum rate of reliable transmission, i.e., watermark capacity. The capacity can be expressed by the mutual information between the encoding and the attacker's corrupted text, indicating how many watermark bits are effectively conveyed under distortion constraints. The system is realized by a learning-based framework with mutual information neural estimators. In the framework, we adopt the assumption of an omniscient attacker and let the watermarking party pit against the attacker who is fully aware of the watermarking strategy. The watermarking party thus achieves higher robustness against removal attacks. We further show that the incorporation of side information substantially enhances the efficacy and robustness of the watermarking system. Experimental results have shown the superiority of our watermarking system compared to the state-of-the-art in terms of capacity, robustness, and preserving text semantics.
What problem does this paper attempt to address?