Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

Dehong Kong,Siyuan Liang,Xiaopeng Zhu,Yuansheng Zhong,Wenqi Ren
2024-10-07
Abstract:Visual language pre-training (VLP) models have demonstrated significant success across various domains, yet they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multimodal learning. Traditionally, adversarial methods targeting VLP models involve simultaneously perturbing images and text. However, this approach faces notable challenges: first, adversarial perturbations often fail to translate effectively into real-world scenarios; second, direct modifications to the text are conspicuously visible. To overcome these limitations, we propose a novel strategy that exclusively employs image patches for attacks, thus preserving the integrity of the original text. Our method leverages prior knowledge from diffusion models to enhance the authenticity and naturalness of the perturbations. Moreover, to optimize patch placement and improve the efficacy of our attacks, we utilize the cross-attention mechanism, which encapsulates intermodal interactions by generating attention maps to guide strategic patch placements. Comprehensive experiments conducted in a white-box setting for image-to-text scenarios reveal that our proposed method significantly outperforms existing techniques, achieving a 100% attack success rate. Additionally, it demonstrates commendable performance in transfer tasks involving text-to-image configurations.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the security issue of Vision - Language Pretraining (VLP) models when facing adversarial attacks. Specifically, existing adversarial methods usually perturb both the image and the text simultaneously, and this method has two main challenges: First, it is difficult to effectively transform the adversarial perturbation into an attack in real - world scenarios; second, directly modifying the text is easily detectable. To address these limitations, the authors propose a new strategy, that is, only using image patches for the attack, thereby maintaining the integrity of the original text. In addition, to improve the effectiveness of the attack, this method utilizes the cross - attention mechanism to guide the placement position of the patches and generates more natural adversarial patches through the diffusion model. ### Main contributions of the paper: 1. **First exploration**: As far as the authors know, this is the first study specifically dedicated to researching the security of VLP models through adversarial patch attacks. 2. **Natural adversarial patches**: A framework based on the diffusion model is introduced to generate more natural adversarial patches. 3. **Cross - modal guidance**: The location of the adversarial patches is determined through cross - modal guidance, which improves the effectiveness of the attack. 4. **Experimental verification**: Experiments were carried out on the Flickr30K and MSCOCO datasets, and the results show that this method performs well in a variety of VLP models, especially achieving a 100% attack success rate in the white - box setting. ### Method overview: - **Threat model**: The attacker's goal is to insert an adversarial patch into the visual input of the VLP model, resulting in an incorrect output for the downstream task. - **Diffusion model**: A pre - trained diffusion model is utilized to generate adversarial patches, ensuring that the generated patches are close to the distribution of real images. - **Patch generation**: Adversarial patches are generated through an optimization algorithm, and the cross - attention mechanism is used to determine the optimal placement position of the patches. - **Loss function**: The scoring loss and the total variation loss are combined to optimize the adversarial patches so that they have a high attack effect while maintaining naturalness. ### Experimental results: - **Performance comparison**: On multiple benchmark datasets and VLP models, the attack success rate of this method is significantly higher than that of other existing methods. - **Naturalness**: The generated adversarial patches are not only effective but also very natural and not easily detectable. Through these contributions, this paper provides new ideas and methods for improving the robustness and security of VLP models.