Abstract:Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect personal images, yet the effectiveness of existing methods is hindered by constrained adaptability to different prompts. In this paper, we introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation based on the modeled distribution. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability. Extensive experiments in face privacy and artistic style protection, demonstrate the superior generalization of PAP in comparison to existing techniques. Our project page is available at <a class="link-external link-https" href="https://github.com/vancyland/Prompt-Agnostic-Adversarial-Perturbation-for-Customized-Diffusion-Models.github.io" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the dependence of existing adversarial perturbation methods on specific prompt words when generating adversarial samples, which leads to the problem that these methods have poor performance when facing unseen prompt words. Specifically, existing adversarial perturbation methods usually need to pre - define and enumerate possible attack prompt words, and the test prompt words must be the same as the training prompt words. However, in practical applications, once encountering unseen test prompt words, these adversarial perturbations become ineffective. For example, Figure 1(b) shows that the perturbation trained based on a specific prompt word A cannot effectively protect the image when facing unseen prompt words B and C. To address this challenge, the paper proposes a new **Prompt - Agnostic Adversarial Perturbation (PAP)** method. PAP overcomes the limitations of existing methods by modeling the prompt word distribution and generating prompt - word - independent perturbations. The specific steps are as follows: 1. **Prompt - word - distribution modeling**: Use Laplace Approximation to model the prompt - word distribution. First, approximate the original distribution \(Q(x_0, c_0)\) by a Gaussian distribution \(\hat{Q}(x_0, c_0)\sim\mathcal{N}(\mu_x, H^{-1})\), where \(\mu_x = \arg\max_c p(c|x_0, c_0)\) and \(H\) is the Hessian matrix at \(\mu_x\). 2. **Parameter estimation**: - **Mean estimator \(\phi\)**: Estimate \(\mu_x\) by minimizing the generation loss function. \[ \hat{\mu}_x=\phi(x_0, \epsilon)=\arg\min_c\sum_{t = 0}^T\|\epsilon-\epsilon_\theta(x_t, t, c)\|^2_2 \] - **Variance estimator \(\psi\)**: Estimate \(H^{-1}\) by Taylor expansion and prior information. \[ \hat{H}^{-1}=\psi(x, \epsilon, c_0, t)=\frac{\|c_0-\hat{\mu}_x\|^2}{2\cdot(L(x, \epsilon, t, c_0; \theta)-L(x, \epsilon, t, \hat{\mu}_x; \theta))}I \] 3. **Maximize the expected interference**: Maximize the expected interference through Monte Carlo sampling to generate prompt - word - independent adversarial perturbations \(\delta\). \[ \delta^*=\arg\max_\delta\mathbb{E}_{c\sim Q(x_0, c_0)}[L_{\text{cond}}(x_0+\delta, c; \theta)] \] Through the above methods, PAP can generate adversarial perturbations that are effective for both known and unknown prompt words, thereby providing more stable defense performance. Experimental results show that PAP performs well in both facial privacy protection and artistic style protection tasks and is superior to existing prompt - word - specific adversarial perturbation methods.

Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Adversarial Examples for Preventing Diffusion Models from Malicious Image Edition

DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models

Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

Mist: Towards Improved Adversarial Examples for Diffusion Models