Abstract:Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both undetectability and robustness to edits when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our key idea is to handle adversarial insertions and deletions by interpreting the symbols as indices into the codeword, which we call indexing pseudorandom codes. Additionally, our codes rely on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.

Publicly-Detectable Watermarking for Language Models

Black-Box Detection of Language Model Watermarks

Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice

Provably Robust Watermarks for Open-Source Language Models

A Watermark for Black-Box Language Models

Baselines for Identifying Watermarked Large Language Models

Multi-Designated Detector Watermarking for Language Models

On the Reliability of Watermarks for Large Language Models

Signal Watermark on Large Language Models

Undetectable Watermarks for Language Models

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

PostMark: A Robust Blackbox Watermark for Large Language Models

Robust Distortion-free Watermarks for Language Models

Learnable Linguistic Watermarks for Tracing Model Extraction Attacks on Large Language Models

Mark My Words: Analyzing and Evaluating Language Model Watermarks

Edit Distance Robust Watermarks for Language Models

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs

A Semantic Invariant Robust Watermark for Large Language Models