Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Kuanrong Liu,Siyuan Liang,Jiawei Liang,Pengwen Dai,Xiaochun Cao
2024-09-29
Abstract:Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).
Cryptography and Security,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively defend against backdoor attacks in multimodal contrastive learning (MCL). MCL uses multiple data modalities to create high - quality feature representations, but its dependence on a large number of data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during the training process and are activated by specific triggers during inference, thus causing serious security threats to the model. Although existing methods to reduce the impact of such attacks through fine - tuning exist, these methods usually require a large amount of training time and may reduce the accuracy of the model on clean samples. To meet this challenge, this research proposes an effective defense mechanism based on the concept of machine unlearning - Unlearn Backdoor Threats (UBT). Specifically, UBT is achieved through the following steps: 1. **Over - fitting training**: Use a pre - trained model to identify a set of suspicious samples, and enhance the backdoor features in these samples through over - fitting training. 2. **Suspicious sample detection**: Use the over - fitted model to further analyze the set of suspicious samples and find the subset that has the greatest impact on the backdoor. 3. **Token - based local unlearning**: Eliminate the backdoor effect by performing local unlearning on a selected small number of samples while maintaining the overall performance of the model. This method selectively forgets backdoor samples instead of the entire sample, thereby minimizing the impact of backdoor attacks while maximizing the performance of the model on clean samples. Experimental results show that the UBT method performs well in reducing the attack success rate (ASR) while maintaining a high clean sample accuracy (CA). Compared with the existing state - of - the - art method (SOTA), the UBT method reduces the attack success rate by 19% and increases the clean sample accuracy by 2.57%.