Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Yihe Fan,Yuxin Cao,Ziyu Zhao,Ziyao Liu,Shaofeng Li
2024-08-11
Abstract:Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that increasingly influence various aspects of our daily lives, constantly defining the new boundary of Artificial General Intelligence (AGI). Image modalities, enriched with profound semantic information and a more continuous mathematical nature compared to other modalities, greatly enhance the functionalities of MLLMs when integrated. However, this integration serves as a double-edged sword, providing attackers with expansive vulnerabilities to exploit for highly covert and harmful attacks. The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research. In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs. Initially, we delineate the foundational components and training processes of MLLMs. Subsequently, we construct a threat model, outlining the security vulnerabilities intrinsic to MLLMs. Moreover, we analyze and summarize existing scholarly discourses on MLLMs' attack and defense mechanisms, culminating in suggestions for the future research on MLLM security. Through this comprehensive analysis, we aim to deepen the academic understanding of MLLM security challenges and propel forward the development of trustworthy MLLM systems.
Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily explores the challenges and risks associated with the security of Multimodal Large Language Models (MLLMs), particularly the new threats introduced when these models handle image inputs. The authors first outline the basic structure and training process of MLLMs and then build a security threat model for MLLMs based on this foundation. The main objectives of the paper include: 1. **Elucidating the multiple risks introduced by the image modality**: Since images carry rich semantic information and have continuous mathematical properties, they enhance MLLM functionality while also introducing new security risks. 2. **Constructing a threat model**: Detailed description of potential vulnerabilities, attack scenarios, and attack targets within MLLMs. 3. **Reviewing existing research**: A comprehensive review of current research on MLLM attacks and defense mechanisms. 4. **Proposing future research directions**: Based on the above analysis, suggestions for future research in the field of MLLM security are provided. Specifically, the paper covers the following key points: - **Basic architecture and training process**: Explanation of the five main components of MLLMs (modality encoders, input projectors, LLM backbone, output projectors, and modality generators), as well as the two main training processes (multimodal pre-training and multimodal instruction fine-tuning). - **Threat model**: Discussion of various vulnerabilities in MLLMs (such as training data poisoning, complexity of multimodal inputs), attack scenarios (white-box, black-box, gray-box attacks), and attack targets (such as cognitive biases, specific string outputs, jailbreaks, etc.). - **Attack methods**: Introduction of several major types of attacks, including structured attacks, adversarial perturbation attacks, and data poisoning attacks. - **Defense measures**: Summary of the two main branches of existing defenses—training phase defenses and inference phase defenses. Through the above content, the paper aims to deepen the academic community's understanding of the security challenges of MLLMs and promote the development of trustworthy MLLM systems.