Abstract:Large text-to-image models have shown remarkable performance in synthesizing high-quality images. In particular, the subject-driven model makes it possible to personalize the image synthesis for a specific subject, e.g., a human face or an artistic style, by fine-tuning the generic text-to-image model with a few images from that subject. Nevertheless, misuse of subject-driven image synthesis may violate the authority of subject owners. For example, malicious users may use subject-driven synthesis to mimic specific artistic styles or to create fake facial images without authorization. To protect subject owners against such misuse, recent attempts have commonly relied on adversarial examples to indiscriminately disrupt subject-driven image synthesis. However, this essentially prevents any benign use of subject-driven synthesis based on protected images. In this paper, we take a different angle and aim at protection without sacrificing the utility of protected images for general synthesis purposes. Specifically, we propose GenWatermark, a novel watermark system based on jointly learning a watermark generator and a detector. In particular, to help the watermark survive the subject-driven synthesis, we incorporate the synthesis process in learning GenWatermark by fine-tuning the detector with synthesized images for a specific subject. This operation is shown to largely improve the watermark detection accuracy and also ensure the uniqueness of the watermark for each individual subject. Extensive experiments validate the effectiveness of GenWatermark, especially in practical scenarios with unknown models and text prompts (74% Acc.), as well as partial data watermarking (80% Acc. for 1/4 watermarking). We also demonstrate the robustness of GenWatermark to two potential countermeasures that substantially degrade the synthesis quality.

HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN

Collaborative Watermarking for Adversarial Speech Synthesis

Warfare:Breaking the Watermark Protection of AI-Generated Content

Ghost-in-Wave: How Speaker-Irrelative Features Interfere DeepFake Voice Detectors

WavMark: Watermarking for Audio Generation

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

Proactive Detection of Voice Cloning with Localized Watermarking

HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation

Detecting Voice Cloning Attacks via Timbre Watermarking

Robust Adversarial Watermark Defending Against GAN Synthesization Attack

AVSecure: an Audio-Visual Watermarking Framework for Proactive Deepfake Detection

A watermark detection scheme based on non-parametric model applied to mute machine voice

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Latent Watermarking of Audio Generative Models

GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

Suppressing High-Frequency Artifacts for Generative Model Watermarking by Anti-Aliasing

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis

Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis

Towards generalizing deep-audio fake detection networks

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification

Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs