Abstract:Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images or image-text pairs. CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we take the first step to analyze the limitations of existing backdoor attacks and propose new DPBAs called CorruptEncoder to CL. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs. In particular, CorruptEncoder is the first DPBA that achieves more than 90% attack success rates with only a few (3) reference images and a small poisoning ratio 0.5%. Moreover, we also propose a defense, called localized cropping, to defend against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the effectiveness problem of Data Poisoning based Backdoor Attacks (DPBAs) in Contrastive Learning (CL). Specifically, the author points out that the existing DPBAs have limited effectiveness in practical applications and proposes a new attack method - CorruptEncoder - to improve the success rate and effectiveness of the attack. #### Main problems: 1. **Limitations of existing attack methods**: - Existing DPBAs such as SSL - Backdoor, PoisonedEncoder and CTRL fail to achieve the ideal attack effect in practical applications. - SSL - Backdoor requires a large number of target - class images to construct an effective attack, which requires a great deal of manual effort. - PoisonedEncoder performs well on simple datasets but has limited effectiveness on complex datasets (such as ImageNet). - CTRL improves concealment by embedding triggers in the frequency domain, but its attack effect is sensitive to the amplitude of the trigger and is not effective on large - scale datasets. 2. **Lack of theoretical guidance**: - Existing methods lack theoretical analysis to guide how to optimize the feature similarity between the trigger and the target - class object, resulting in unsatisfactory attack effects. 3. **Low attack success rate**: - Existing methods are difficult to achieve a high attack success rate in the case of a small number of reference images and a low poisoning ratio. #### Solutions: To solve the above problems, the author proposes CorruptEncoder, a new data - poisoning - based backdoor attack method. The main innovations of this method include: - **New attack strategy**: By using the random cropping mechanism, the reference object and the trigger are embedded in the background image to create poisoned images. - **Theoretical guidance**: Through theoretical analysis, the optimal size of the background image and the optimal positions of the reference object and the trigger are determined to maximize the attack effect. - **High efficiency**: Experiments show that CorruptEncoder can achieve an attack success rate of over 90% with an extremely small number (for example, 3) of reference images and an extremely low poisoning ratio (0.5%). - **Defense measures**: The author also proposes a defense method named "local cropping", which reduces the success rate of the attack by cropping adjacent areas during pre - training, but also sacrifices the practicality of the encoder. #### Conclusion: The author shows through CorruptEncoder how to significantly improve the backdoor attack effect in contrastive learning through theoretical guidance and innovative strategies, and emphasizes the need for more powerful defense mechanisms in practical applications.

CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

Watermarking Pre-trained Encoders in Contrastive Learning

Manipulating Pre-Trained Encoder for Targeted Poisoning Attacks in Contrastive Learning

Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Poisoning and Backdooring Contrastive Learning

Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

Adversarial Backdoor Defense in CLIP

Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Backdoor Contrastive Learning via Bi-level Trigger Optimization

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

IPES: Improved Pre-trained Encoder Stealing Attack in Contrastive Learning

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning

BAGEL: Backdoor Attacks against Federated Contrastive Learning

CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats