Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Kuanrong Liu,Siyuan Liang,Jiawei Liang,Pengwen Dai,Xiaochun Cao

2024-09-29

Abstract:Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the malicious impacts of such attacks, these defenses frequently necessitate extensive training time and degrade clean accuracy. In this study, we propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities, known as Unlearn Backdoor Threats (UBT). We specifically use overfit training to improve backdoor shortcuts and accurately detect suspicious samples in the potential poisoning data set. Then, we select fewer unlearned samples from suspicious samples for rapid forgetting in order to eliminate the backdoor effect and thus improve backdoor defense efficiency. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime. This technique focuses on the model's compromised elements, dissociating backdoor correlations while maintaining the model's overall integrity. Extensive experimental results show that our method effectively defends against various backdoor attack methods in the CLIP model. Compared to SoTA backdoor defense methods, UBT achieves the lowest attack success rate while maintaining a high clean accuracy of the model (attack success rate decreases by 19% compared to SOTA, while clean accuracy increases by 2.57%).

Cryptography and Security,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively defend against backdoor attacks in multimodal contrastive learning (MCL). MCL uses multiple data modalities to create high - quality feature representations, but its dependence on a large number of data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during the training process and are activated by specific triggers during inference, thus causing serious security threats to the model. Although existing methods to reduce the impact of such attacks through fine - tuning exist, these methods usually require a large amount of training time and may reduce the accuracy of the model on clean samples. To meet this challenge, this research proposes an effective defense mechanism based on the concept of machine unlearning - Unlearn Backdoor Threats (UBT). Specifically, UBT is achieved through the following steps: 1. **Over - fitting training**: Use a pre - trained model to identify a set of suspicious samples, and enhance the backdoor features in these samples through over - fitting training. 2. **Suspicious sample detection**: Use the over - fitted model to further analyze the set of suspicious samples and find the subset that has the greatest impact on the backdoor. 3. **Token - based local unlearning**: Eliminate the backdoor effect by performing local unlearning on a selected small number of samples while maintaining the overall performance of the model. This method selectively forgets backdoor samples instead of the entire sample, thereby minimizing the impact of backdoor attacks while maximizing the performance of the model on clean samples. Experimental results show that the UBT method performs well in reducing the attack success rate (ASR) while maintaining a high clean sample accuracy (CA). Compared with the existing state - of - the - art method (SOTA), the UBT method reduces the attack success rate by 19% and increases the clean sample accuracy by 2.57%.

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning

ATTEQ-NN: Attention-based QoE-aware Evasive Backdoor Attacks.

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Adversarial Backdoor Defense in CLIP

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

Backdoor Contrastive Learning via Bi-level Trigger Optimization

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples

Mitigating Backdoor Attacks using Activation-Guided Model Editing

Backdoor Attacks via Machine Unlearning

CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning

Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness

Parity measurements of nuclear levels using a free-electron-laser generated gamma-ray beam.

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks

PBP: Post-training Backdoor Purification for Malware Classifiers