Abstract:Existing methods have evolved from using synonym substitution to incorporating arbitrary word substitution to achieve reversible natural language watermarking. However, a notable limitation is that they are prone to overlook the sensitivity of information associated with the original words, with a tendency to prefer non-sensitive words for substitution. As a result, a potential risk of sensitive information leakage contained in the original text is posed. Furthermore, while aiming for reversibility, the overall performance of the watermarking method may be inadvertently compromised. In response to the above problems, this paper puts forward a novel reversible natural language watermarking method that combines a K eyword S ubstitution scheme and a P rediction E rror E xpansion algorithm (KSPEE) to protect sensitive information, verify content integrity, protect copyright, and so on. Specifically, KSPEE leverages a keyword extraction algorithm to identify important content containing sensitive information in the original text, thereby determining the potential positions for watermark information embedding. Subsequently, a masked language model is utilized to predict appropriate substitution words based on the surrounding semantic information of the embedding position . In addition, the prediction error expansion algorithm is employed to select appropriate words for substituting the original keywords, ensuring the successful embedding of watermark information while maintaining the recoverability of the original keywords. By identifying keywords and substituting them, a suitable method of protecting the original sensitive information is provided. Extensive experiments demonstrate that, under the promise of semantic distortion and lossless restoration of the original content, the proposed method KSPEE achieves outstanding watermarked text quality. A higher watermark embedding rate is achieved and strong security is shown by KSPEE. More importantly, KSPEE effectively prevents the leakage of sensitive information.

A Hybrid Intelligent Text Watermarking and Natural Language Processing Approach for Transferring and Receiving an Authentic English Text Via Internet

Content authentication and tampering detection of Arabic text: an approach based on zero-watermarking and natural language processing

A Watermarking Algorithm for Image Content Authentication in Double-Compression Environment

A Text Watermarking Algorithm based on Hidden Object.

Digital Watermarking Technique for Text Document Protection Using Data Mining Analysis

On the Reliability of Watermarks for Large Language Models

A reversible natural language watermarking for sensitive information protection

Intelligent Watermark Recovery Using Spatial Domain Extension

WaterPark: A Robustness Assessment of Language Model Watermarking

Robust Authentication for Paper-Based Text Documents Based on Text Watermarking Technology.

Arabic Text Watermarking: A Review

Robust Distortion-free Watermarks for Language Models

Fortifying Textual Integrity: Evolutionary Optimization-powered Watermarking for Tampering Attack Detection in Digital Documents

Robust Multi-bit Natural Language Watermarking through Invariant Features

On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

Tracing Text Provenance Via Context-Aware Lexical Substitution

Optimal Semi-Fragile Watermarking Based on Maximum Entropy Random Walk and Swin Transformer for Tamper Localization

A Novel Scheme for Watermarking Natural Language Text

Reversible source-aware natural language watermarking via customized lexical substitution

Emerging Arabic Text Watermarking Utilizing Combinations of Different Diacritics

WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents