Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

Cong Wan,Yuhang He,Xiang Song,Yihong Gong
2024-10-10
Abstract:Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect personal images, yet the effectiveness of existing methods is hindered by constrained adaptability to different prompts. In this paper, we introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation based on the modeled distribution. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability. Extensive experiments in face privacy and artistic style protection, demonstrate the superior generalization of PAP in comparison to existing techniques. Our project page is available at <a class="link-external link-https" href="https://github.com/vancyland/Prompt-Agnostic-Adversarial-Perturbation-for-Customized-Diffusion-Models.github.io" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the dependence of existing adversarial perturbation methods on specific prompt words when generating adversarial samples, which leads to the problem that these methods have poor performance when facing unseen prompt words. Specifically, existing adversarial perturbation methods usually need to pre - define and enumerate possible attack prompt words, and the test prompt words must be the same as the training prompt words. However, in practical applications, once encountering unseen test prompt words, these adversarial perturbations become ineffective. For example, Figure 1(b) shows that the perturbation trained based on a specific prompt word A cannot effectively protect the image when facing unseen prompt words B and C. To address this challenge, the paper proposes a new **Prompt - Agnostic Adversarial Perturbation (PAP)** method. PAP overcomes the limitations of existing methods by modeling the prompt word distribution and generating prompt - word - independent perturbations. The specific steps are as follows: 1. **Prompt - word - distribution modeling**: Use Laplace Approximation to model the prompt - word distribution. First, approximate the original distribution \(Q(x_0, c_0)\) by a Gaussian distribution \(\hat{Q}(x_0, c_0)\sim\mathcal{N}(\mu_x, H^{-1})\), where \(\mu_x = \arg\max_c p(c|x_0, c_0)\) and \(H\) is the Hessian matrix at \(\mu_x\). 2. **Parameter estimation**: - **Mean estimator \(\phi\)**: Estimate \(\mu_x\) by minimizing the generation loss function. \[ \hat{\mu}_x=\phi(x_0, \epsilon)=\arg\min_c\sum_{t = 0}^T\|\epsilon-\epsilon_\theta(x_t, t, c)\|^2_2 \] - **Variance estimator \(\psi\)**: Estimate \(H^{-1}\) by Taylor expansion and prior information. \[ \hat{H}^{-1}=\psi(x, \epsilon, c_0, t)=\frac{\|c_0-\hat{\mu}_x\|^2}{2\cdot(L(x, \epsilon, t, c_0; \theta)-L(x, \epsilon, t, \hat{\mu}_x; \theta))}I \] 3. **Maximize the expected interference**: Maximize the expected interference through Monte Carlo sampling to generate prompt - word - independent adversarial perturbations \(\delta\). \[ \delta^*=\arg\max_\delta\mathbb{E}_{c\sim Q(x_0, c_0)}[L_{\text{cond}}(x_0+\delta, c; \theta)] \] Through the above methods, PAP can generate adversarial perturbations that are effective for both known and unknown prompt words, thereby providing more stable defense performance. Experimental results show that PAP performs well in both facial privacy protection and artistic style protection tasks and is superior to existing prompt - word - specific adversarial perturbation methods.