Adversarial Prompt Distillation for Vision-Language Models

Lin Luo,Xin Wang,Bojia Zi,Shihao Zhao,Xingjun Ma
2024-11-22
Abstract:Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image Pre-Training (CLIP) have been shown to be susceptible to adversarial attacks, raising concerns about their deployment in safety-critical scenarios like autonomous driving and medical diagnosis. One promising approach for improving the robustness of pre-trained VLMs is Adversarial Prompt Tuning (APT), which combines adversarial training with prompt tuning. However, existing APT methods are mostly single-modal methods that design prompt(s) for only the visual or textual modality, limiting their effectiveness in either robustness or clean accuracy. In this work, we propose a novel method called Adversarial Prompt Distillation (APD) that combines APT with knowledge distillation to boost the adversarial robustness of CLIP. Specifically, APD is a bimodal method that adds prompts for both the visual and textual modalities while leveraging a cleanly pre-trained teacher CLIP model to distill and boost the performance of the student CLIP model on downstream tasks. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD over the current state-of-the-art APT methods in terms of both natural and adversarial performances. The effectiveness of our APD method validates the possibility of using a non-robust teacher to improve the generalization and robustness of VLMs.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the vulnerability problem of large - scale pre - trained vision - language models (Vision - Language Models, VLMs) under adversarial attacks, especially for applications in safety - critical scenarios. Specifically, the paper focuses on how to improve the robustness of the CLIP (Contrastive Language - Image Pre - training) model when facing adversarial image attacks. #### Background and Motivation 1. **Vulnerability of VLMs**: - Although large - scale pre - trained vision - language models (such as CLIP) perform well in multi - modal tasks, they are vulnerable to adversarial attacks. These attacks cause the model to output errors by adding small perturbations to the input image or text. - This vulnerability is particularly concerning in safety - critical scenarios such as autonomous driving and medical diagnosis. 2. **Limitations of Existing Methods**: - Most of the existing Adversarial Prompt Tuning (APT) methods are only for a single modality (vision or text), which limits their effectiveness in terms of robustness and clean accuracy. - Single - modality methods are usually unable to fully cope with cross - modality adversarial attacks. #### Proposed Solution To solve the above problems, the paper proposes a new method - Adversarial Prompt Distillation (APD). The main features of APD are as follows: 1. **Bi - modal Defense**: - APD is a bi - modal method. It inserts learnable prompts in both the visual and text branches to enhance adversarial robustness. - This bi - modal design enables the model to perform more effective robustness alignment in the joint image - text representation space. 2. **Knowledge Distillation**: - APD uses a clean pre - trained teacher CLIP model to distill and improve the performance of the student CLIP model. - The teacher model processes natural images, while the student model processes adversarial images, and aligns the student's output with the teacher's output through the Kullback - Leibler (KL) divergence loss. 3. **No Need for Robust Pre - training**: - APD does not need to rely on any robust pre - trained model, making this method more practical and applicable to any standard pre - trained CLIP model. #### Experimental Results Through extensive experiments on multiple benchmark datasets, APD has demonstrated its superior performance in terms of natural accuracy and adversarial robustness, especially under strong adversarial attacks such as PGD and AutoAttack. In addition, APD also shows strong robustness under adaptive attacks, verifying its potential in practical applications. In summary, this paper significantly improves the robustness of the CLIP model under adversarial attacks by introducing the APD method while maintaining good natural accuracy, providing strong support for the safe application of VLMs.