COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation.

Junjie Fang,Zhixing Tan,Xiaodong Shi
DOI: https://doi.org/10.1007/978-3-031-44693-1_55
2023-01-01
Abstract:With the increasing use of natural language generation (NLG) models, there is a growing need to differentiate between machine-generated text and natural language text. One promising approach is watermarking, which can help identify machine-generated text and protect against risks such as spam emails and academic dishonesty. However, existing watermarking methods can significantly affect the semantic meaning of the text, creating a need for more effective techniques that maintain semantic integrity. In this paper, we propose a novel watermarking method called COntextual SYnonym WAtermarking (COSYWA) that embeds watermarks in text using a Masked Language Model (MLM) without significantly impairing its semantics. Specifically, we use post-processing to embed watermarks in the output of an NLG model. We generate a context-based synonym set using an MLM model to embed watermark information and use statistical hypothesis testing to detect the existence of watermarking. Our experimental results show that COSYWA significantly enhances the text’s capacity to maintain its original meaning while effectively embedding a watermark, making it a promising approach for protecting against misinformation in NLG.
What problem does this paper attempt to address?