Backdooring Vision-Language Models with Out-Of-Distribution Data

Weimin Lyu,Jiachen Yao,Saumya Gupta,Lu Pang,Tao Sun,Lingjie Yi,Lijie Hu,Haibin Ling,Chao Chen
2024-10-02
Abstract:The emergence of Vision-Language Models (VLMs) represents a significant advancement in integrating computer vision with Large Language Models (LLMs) to generate detailed text descriptions from visual inputs. Despite their growing importance, the security of VLMs, particularly against backdoor attacks, is under explored. Moreover, prior works often assume attackers have access to the original training data, which is often unrealistic. In this paper, we address a more practical and challenging scenario where attackers must rely solely on Out-Of-Distribution (OOD) data. We introduce VLOOD (Backdooring Vision-Language Models with Out-of-Distribution Data), a novel approach with two key contributions: (1) demonstrating backdoor attacks on VLMs in complex image-to-text tasks while minimizing degradation of the original semantics under poisoned inputs, and (2) proposing innovative techniques for backdoor injection without requiring any access to the original training data. Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of VLOOD, revealing a critical security vulnerability in VLMs and laying the foundation for future research on securing multimodal models against sophisticated threats.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the security problem of Vision - Language Models (VLMs) when facing backdoor attacks, especially when the attacker can only use Out - Of - Distribution (OOD) external data that has a different distribution from the original training data. Specifically, the paper explores the following points: 1. **Limitations of existing research**: Most existing backdoor attack studies assume that the attacker can access the original training data, which is often unrealistic in practical scenarios. In addition, there are fewer backdoor attack studies for the complex image - to - text generation tasks of VLMs. 2. **Introduction of new methods**: To meet the above challenges, the paper proposes a new backdoor attack method - VLOOD (Backdooring Vision - Language Models with Out - of - Distribution Data). This method can inject backdoors in complex image - to - text generation tasks while minimizing semantic degradation and does not require access to the original training data. 3. **Key contributions**: - **First exploration**: This is the first attempt to perform backdoor attacks on VLMs using OOD data in practical scenarios. - **Innovative technologies**: Proposed Clean Knowledge Preservation (CKP) and Conceptual Consistency Preservation (CCP) technologies, as well as a dynamic weight adjustment mechanism, to ensure that the model can still maintain high semantic consistency when processing poisoned inputs. - **Evaluation and verification**: Through experiments on image captioning and Visual Question Answering (VQA) tasks, the effectiveness of VLOOD is proved, and the key security vulnerabilities in VLMs are revealed. ### Formula summary - **CKP loss function**: \[ L_{\text{CKP}}=\text{KL}(F(I, T)\parallel\tilde{F}(I, T)) = \frac{1}{N}\sum_{(I, T, O)\in D}F(I, T)\log\frac{F(I, T)}{\tilde{F}(I, T)} \] where \((I, T, O)\in D\) are clean samples and \(N\) is the number of clean samples. - **CCP loss function**: \[ S = \frac{1}{n}\sum_{i = 1}^{n}\|a_i - x_i\|_1 \] \[ L_{\text{CCP}}=\frac{1}{N}\sum_{(\tilde{I},\tilde{T},\tilde{O})\in\tilde{D}}\left(\frac{1}{1+\exp(-S)}\right) \] - **Dynamic weight adjustment**: \[ \lambda=\lambda+(\text{Impact}_{\text{clean}}-\text{Impact}_{\text{poisoned}}) \] - **Overall loss function**: \[ L=(1 - \lambda)\cdot(L_{\text{LM}}(\text{clean})+L_{\text{CKP}})+\lambda\cdot(L_{\text{LM}}(\text{poisoned})+L_{\text{CCP}}) \] Through these technologies and methods, VLOOD successfully injects backdoors while ensuring the normal behavior of the model, showing the challenges faced by VLMs in terms of security.