Abstract:AI-Generated Content (AIGC) is gaining great popularity, with many emerging commercial services and applications. These services leverage advanced generative models, such as latent diffusion models and large language models, to generate creative content (e.g., realistic images and fluent sentences) for users. The usage of such generated content needs to be highly regulated, as the service providers need to ensure the users do not violate the usage policies (e.g., abuse for commercialization, generating and distributing unsafe content). A promising solution to achieve this goal is watermarking, which adds unique and imperceptible watermarks on the content for service verification and attribution. Numerous watermarking approaches have been proposed recently. However, in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks. (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely bypassing the regulation of the service provider. (2) Watermark forging: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions. We propose Warfare, a unified methodology to achieve both attacks in a holistic way. The key idea is to leverage a pre-trained diffusion model for content processing and a generative adversarial network for watermark removal or forging. We evaluate Warfare on different datasets and embedding setups. The results prove that it can achieve high success rates while maintaining the quality of the generated content. Compared to existing diffusion model-based attacks, Warfare is 5,050~11,000x faster.

ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

Warfare:Breaking the Watermark Protection of AI-Generated Content

Defending Against Similarity Shift Attack for EaaS Via Adaptive Multi-Target Watermarking

Robust Blind Video Watermarking with Adaptive Embedding Mechanism

RobWE: Robust Watermark Embedding for Personalized Federated Learning Model Ownership Protection

Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs

Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

WaterPark: A Robustness Assessment of Language Model Watermarking

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Persistent and Unforgeable Watermarks for Deep Neural Networks.

ModelShield: Adaptive and Robust Watermark against Model Extraction Attack

Protecting Intellectual Property of Large Language Model-Based Code Generation APIs Via Watermarks

Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Robust 3D Watermarking with High Imperceptibility Based on EMD on Surfaces

Protecting Your NLG Models with Semantic and Robust Watermarks

MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code

MEA-Defender: A Robust Watermark against Model Extraction Attack