Abstract:Due to the high cost of training, large model (LM) practitioners commonly use pretrained models downloaded from untrusted sources, which could lead to owning compromised models. In-context learning is the ability of LMs to perform multiple tasks depending on the prompt or context. This can enable new attacks, such as backdoor attacks with dynamic behavior depending on how models are prompted.
In this paper, we leverage the ability of vision transformers (ViTs) to perform different tasks depending on the prompts. Then, through data poisoning, we investigate two new threats: i) task-specific backdoors where the attacker chooses a target task to attack, and only the selected task is compromised at test time under the presence of the trigger. At the same time, any other task is not affected, even if prompted with the trigger. We succeeded in attacking every tested model, achieving up to 89.90\% degradation on the target task. ii) We generalize the attack, allowing the backdoor to affect \emph{any} task, even tasks unseen during the training phase. Our attack was successful on every tested model, achieving a maximum of $13\times$ degradation. Finally, we investigate the robustness of prompts and fine-tuning as techniques for removing the backdoors from the model. We found that these methods fall short and, in the best case, reduce the degradation from 89.90\% to 73.46\%.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the issue of injecting backdoor attacks during the in - context learning process in Vision Transformers (ViTs). Specifically, the paper explores how to utilize the ability of ViTs to perform different tasks based on context and, through data poisoning, investigates two new threats:
1. **Task - specific backdoor**: The attacker selects a target task to attack. During testing, only the selected task is compromised in the presence of a trigger, while other tasks are not affected even with the trigger. In the experiment, each test model was successfully attacked, with a performance drop of up to 89.90%.
2. **Task - agnostic backdoor**: The attacker can make the backdoor affect any task, even tasks not seen during the training phase. This attack was successful on each test model, causing a maximum performance drop of 13 times.
In addition, the paper also explores the effectiveness of prompt engineering and fine - tuning as methods to remove backdoors in the model, but finds that these methods have limited effectiveness, being able to reduce the performance drop from 89.90% to 73.46% at most.
### Main contributions
1. **First study of in - context learning backdoor attacks in ViTs**: It shows how to dynamically execute malicious tasks based on context, which can be achieved with only 121 malicious samples, and can even target tasks not seen during testing.
2. **Propose new threat models**: Existing backdoor attack threat models are not applicable to in - context learning backdoors in ViTs, so the paper proposes a new threat model.
3. **Introduce new evaluation metrics**: Since existing metrics are not directly applicable to this new threat, the paper designs a new set of metrics to evaluate the effectiveness of the attack.
4. **Explore potential defense methods**: The effectiveness of prompt engineering and fine - tuning as defense means is studied, and the results show that traditional methods are not sufficient to deal with this new backdoor attack, emphasizing the need for specialized defense strategies.
### Background and challenges
- **Differences between MIM and traditional learning strategies**: MIM is a self - supervised learning method. The model learns by predicting the missing parts, which makes the backdoor in in - context learning different from the traditional computer vision backdoor.
- **Differences between classical backdoors and in - context learning backdoors**:
- **Task specificity**: Traditional backdoor attacks target specific tasks, while in - context learning backdoor attacks can select any task during inference.
- **Backdoor generalization**: There is no need to access the data of the target task. Poisoning a small task - specific data set can affect other unrelated tasks.
- **Need for new threat models**: Existing threat models are not applicable to backdoor attacks with unknown target tasks.
- **Need for new evaluation metrics**: The traditional Attack Success Rate (ASR) is no longer applicable, and new metrics are needed to evaluate the attack effect.
### Methods
- **Task - specific backdoor**: By injecting a trigger into specific task data and retraining the model, the model outputs a malicious task when it encounters the trigger during testing. For example, select the segmentation task as the attack target, add the trigger to the input image, and make the model output a completely green image.
- **Task - agnostic backdoor**: Through more extensive poisoning, make the backdoor affect any task, even unseen tasks.
In conclusion, this paper conducts an in - depth study of the mechanisms and defense methods of in - context learning backdoor attacks in ViTs, providing an important reference for future research.