Abstract:Existing methods have evolved from using synonym substitution to incorporating arbitrary word substitution to achieve reversible natural language watermarking. However, a notable limitation is that they are prone to overlook the sensitivity of information associated with the original words, with a tendency to prefer non-sensitive words for substitution. As a result, a potential risk of sensitive information leakage contained in the original text is posed. Furthermore, while aiming for reversibility, the overall performance of the watermarking method may be inadvertently compromised. In response to the above problems, this paper puts forward a novel reversible natural language watermarking method that combines a K eyword S ubstitution scheme and a P rediction E rror E xpansion algorithm (KSPEE) to protect sensitive information, verify content integrity, protect copyright, and so on. Specifically, KSPEE leverages a keyword extraction algorithm to identify important content containing sensitive information in the original text, thereby determining the potential positions for watermark information embedding. Subsequently, a masked language model is utilized to predict appropriate substitution words based on the surrounding semantic information of the embedding position . In addition, the prediction error expansion algorithm is employed to select appropriate words for substituting the original keywords, ensuring the successful embedding of watermark information while maintaining the recoverability of the original keywords. By identifying keywords and substituting them, a suitable method of protecting the original sensitive information is provided. Extensive experiments demonstrate that, under the promise of semantic distortion and lossless restoration of the original content, the proposed method KSPEE achieves outstanding watermarked text quality. A higher watermark embedding rate is achieved and strong security is shown by KSPEE. More importantly, KSPEE effectively prevents the leakage of sensitive information.

A Principled Approach to Natural Language Watermarking

Warfare:Breaking the Watermark Protection of AI-Generated Content

A Text Watermarking Algorithm based on Hidden Object.

No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices

WaterPark: A Robustness Assessment of Language Model Watermarking

Robust Multi-bit Natural Language Watermarking through Invariant Features

Resilient Natural Language Watermarking Based on Pragmatics

On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

On the Reliability of Watermarks for Large Language Models

A reversible natural language watermarking for sensitive information protection

An Evaluating Scheme for Embedding Security of Natural Language Watermarking

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Practical Analysis of Watermarking Capacity

Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms

A watermarking scheme for resolving rightful copyright

DeepHider: A Covert NLP Watermarking Framework Based on Multi-task Learning

Analyzing and Evaluating the Robustness of Natural Language Watermarking

Watermarking Language Models for Many Adaptive Users

Watermarking Text Generated by Black-Box Language Models

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Topic-Based Watermarks for LLM-Generated Text