Abstract:LLMs now exhibit human-like skills in various fields, leading to worries about misuse. Thus, detecting generated text is crucial. However, passive detection methods are stuck in domain specificity and limited adversarial robustness. To achieve reliable detection, a watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation. The method involves randomly dividing the model vocabulary to obtain a special list and adjusting the probability distribution to promote the selection of words in the list. A detection algorithm aware of the list can identify the watermarked text. However, this method is not applicable in many real-world scenarios where only black-box language models are available. For instance, third-parties that develop API-based vertical applications cannot watermark text themselves because API providers only supply generated text and withhold probability distributions to shield their commercial interests. To allow third-parties to autonomously inject watermarks into generated text, we develop a watermarking framework for black-box language model usage scenarios. Specifically, we first define a binary encoding function to compute a random binary encoding corresponding to a word. The encodings computed for non-watermarked text conform to a Bernoulli distribution, wherein the probability of a word representing bit-1 being approximately 0.5. To inject a watermark, we alter the distribution by selectively replacing words representing bit-0 with context-based synonyms that represent bit-1. A statistical test is then used to identify the watermark. Experiments demonstrate the effectiveness of our method on both Chinese and English datasets. Furthermore, results under re-translation, polishing, word deletion, and synonym substitution attacks reveal that it is arduous to remove the watermark without compromising the original semantics.

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

Protecting Intellectual Property of Large Language Model-Based Code Generation APIs Via Watermarks

Warfare:Breaking the Watermark Protection of AI-Generated Content

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Watermarking Large Language Models and the Generated Content: Opportunities and Challenges

CodeWMBench: an Automated Benchmark for Code Watermarking Evaluation

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

ACW: Enhancing Traceability of AI-Generated Codes Based on Watermarking

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Source Attribution for Large Language Model-Generated Data

Protecting Language Generation Models via Invisible Watermarking

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Towards Tracing Code Provenance with Code Watermarking

Watermarking Text Generated by Black-Box Language Models

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models