Abstract:Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model's ability to respond to individual prompts.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to conduct data - poisoning attacks on text - to - image generation models based on diffusion models, especially attacks targeting specific prompt words. Although these models are usually trained on billions of pictures, which makes data - poisoning attacks in the traditional sense difficult to succeed (because a large number of poisoning samples are required), the author finds that these models are actually very vulnerable to data - poisoning attacks targeting specific concepts or prompt words. The paper achieves this goal through two key insights: 1. **Concept Sparsity**: Although diffusion models are trained on a large amount of data, the number of training samples related to specific concepts or prompt words is relatively small, usually only a few thousand. This means that these models are very fragile to data - poisoning attacks on specific prompt words. 2. **Carefully - Designed Poisoning Samples**: By carefully designing poisoning samples, the poisoning effect can be maximized, so that the output of the model can be successfully controlled when using an extremely small number of poisoning samples (less than 100). Based on these insights, the author proposes an optimized data - poisoning attack method for specific prompt words named **Nightshade**. Nightshade has the following characteristics: - **High Concealment**: Poisoning samples look almost the same as normal samples, avoiding being detected through manual inspection or prompt generation. - **High Effectiveness**: Even with a very small number of poisoning samples (for example, 100), it can successfully carry out poisoning attacks. - **Influence Scalability**: The poisoning effect is not limited to specific concepts, but will also "permeate" into related concepts, making the attack more difficult to avoid. - **Model Destructiveness**: When multiple independent Nightshade attacks are applied to different prompt words of the same model, the basic feature understanding of the model will be destroyed, resulting in its inability to generate meaningful images. In addition, the author also discusses that Nightshade can be used as a tool for content owners to protect intellectual property rights and prevent web crawlers from ignoring the "do not crawl" instructions. The paper verifies the effectiveness of these attacks through experiments and explores their potential applications and impacts.

Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

Concealed Data Poisoning Attacks on NLP Models

Poisoning Attacks with Generative Adversarial Nets

Understanding Implosion in Text-to-Image Generative Models

Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts

MetaPoison: Practical General-purpose Clean-label Data Poisoning

Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

Invisible Poisoning: Highly Stealthy Targeted Poisoning Attack

Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Indiscriminate Data Poisoning Attacks on Neural Networks

Generative Poisoning Using Random Discriminators

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

From Adversarial Examples to Data Poisoning Instances: Utilizing an Adversarial Attack Method to Poison a Transfer Learning Model

Poisoning Attack on Deep Generative Models in Autonomous Driving

Model Poisoning Attack on Neural Network Without Reference Data

Have You Poisoned My Data? Defending Neural Networks against Data Poisoning

A Concealed Poisoning Attack to Reduce Deep Neural Networks’ Robustness Against Adversarial Samples

Poisoning Language Models During Instruction Tuning