Reversible source-aware natural language watermarking via customized lexical substitution

Ziyu Jiang,Hongxia Wang,Zhenhao Shi,Run Jiao
DOI: https://doi.org/10.1016/j.ipm.2024.103977
IF: 7.466
2024-11-28
Information Processing & Management
Abstract:Current natural language watermarking (NLW) methods generate suitable watermark words based on local context using pre-trained models (PLMs), minimizing semantic loss in watermarked text. However, these methods still exhibit some limitations. Specifically, there is room for improvement on substitutes quality and watermark imperceptibility since they integrate off-the-shelf lexical substitution (LS) models, which are not specifically tailored for watermarking algorithms. They make strict synchronization constraints to generate identical substitutes list from the original and the watermarked text, and therefore precludes consideration of some high-quality substitutes, which curtails the watermark capacity. Additionally, the local context changes via watermarking embedding, and these methods cannot losslessly recover the original text, limiting the application of NLW to high-precision scenarios such as government documents, military, and medical applications. To address these issues, we propose a reversible source-aware NLW approach, which performs proactive mining to identify potential reversible watermark positions by virtue of a PLM and subsequently embeds the watermark into the text via source-aware LS. Also, we have designed a novel LS algorithm tailored for NLW to enhance the imperceptibility and textual fidelity of watermarked content. Experiments validate the efficiency of our LS method in generating the most suitable substitutes and verifies that our NLW approach achieves complete reversibility while enhancing watermark capacity and textual fidelity compared to prior arts.
computer science, information systems,information science & library science
What problem does this paper attempt to address?