Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Zhifang Zhang,Shuo He,Bingquan Shen,Lei Feng
2024-12-29
Abstract:Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets, yet they exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. In this paper, we disclose that CLIP's vulnerabilities primarily stem from its excessive encoding of class-irrelevant features, which can compromise the model's visual feature resistivity to input perturbations, making it more susceptible to capturing the trigger patterns inserted by backdoor attacks. Inspired by this finding, we propose Repulsive Visual Prompt Tuning (RVPT), a novel defense approach that employs specially designed deep visual prompt tuning and feature-repelling loss to eliminate excessive class-irrelevant features while simultaneously optimizing cross-entropy loss to maintain clean accuracy. Unlike existing multimodal backdoor defense methods that typically require the availability of poisoned data or involve fine-tuning the entire model, RVPT leverages few-shot downstream clean samples and only tunes a small number of parameters. Empirical results demonstrate that RVPT tunes only 0.27\% of the parameters relative to CLIP, yet it significantly outperforms state-of-the-art baselines, reducing the attack success rate from 67.53\% to 2.76\% against SoTA attacks and effectively generalizing its defensive capabilities across multiple datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the significant vulnerability of multimodal contrastive learning models (such as CLIP) to backdoor attacks in downstream tasks. Specifically, due to over - encoding category - independent features (CIFs), the visual features of CLIP are overly sensitive to input perturbations, making it more vulnerable to backdoor attacks. Such attacks will cause the model to misclassify images with specific trigger patterns as target categories during the inference stage. To address this issue, the authors propose **Repulsive Visual Prompt Tuning (RVPT)**, a novel defense method. RVPT eliminates excessive category - independent features through specially - designed deep visual prompt tuning and feature - repulsion loss, while optimizing the cross - entropy loss to maintain the accuracy of clean data. Unlike existing methods, RVPT does not require poisoned data or fine - tuning the entire model. Instead, it utilizes a small number of downstream clean samples and only adjusts a small portion of parameters. ### Specific Problem Description 1. **Vulnerability of CLIP**: - CLIP uses large - scale image - text datasets in the pre - training process. These datasets are usually unfiltered network data and are easily maliciously injected with poisonous data. - CLIP is very sensitive to a small number of poisoned samples. Studies have shown that, compared with traditional supervised models, CLIP can be successfully attacked with less poisoned data. - Once poisoned in the pre - training stage, CLIP will misclassify images with specific trigger patterns as target categories during inference. 2. **Limitations of Existing Defense Methods**: - Existing methods usually need to fine - tune the parameters of the entire model or rely on poisoned data, which is both resource - consuming and unrealistic. ### RVPT's Solution RVPT solves the above problems in the following ways: - **Reducing Category - Independent Features**: Minimize the average cosine similarity between the prompt features and the original features through the feature - repulsion loss (FR Loss), thereby filtering out category - independent features that do not contribute to the cross - entropy loss. - **Maintaining Clean - Data Accuracy**: Ensure the accuracy of the model on clean data through the cross - entropy loss (CE Loss). - **Efficiency**: Only adjust a small number of parameters (0.27% relative to CLIP) and use a small number of downstream clean samples for tuning. ### Experimental Results The experimental results show that RVPT performs well under multiple datasets and various backdoor attacks, can significantly reduce the attack success rate (ASR), and maintain a high clean - data accuracy (CA). For example, in the defense against the state - of - the - art attack BadCLIP, RVPT reduces the attack success rate from 67.53% to 2.76%. In addition, RVPT also demonstrates good generalization ability and can effectively defend against backdoor attacks when the target category is not in the tuning dataset, across datasets, and across domains. ### Summary The paper aims to solve the vulnerability problem of multimodal contrastive learning models such as CLIP when facing backdoor attacks, and proposes an efficient and effective defense method - Repulsive Visual Prompt Tuning (RVPT). By reducing category - independent features and maintaining clean - data accuracy, RVPT not only significantly improves the robustness of the model but also demonstrates excellent generalization ability on multiple attacks and datasets.