Abstract:Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities that increasingly influence various aspects of our daily lives, constantly defining the new boundary of Artificial General Intelligence (AGI). Image modalities, enriched with profound semantic information and a more continuous mathematical nature compared to other modalities, greatly enhance the functionalities of MLLMs when integrated. However, this integration serves as a double-edged sword, providing attackers with expansive vulnerabilities to exploit for highly covert and harmful attacks. The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research. In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs. Initially, we delineate the foundational components and training processes of MLLMs. Subsequently, we construct a threat model, outlining the security vulnerabilities intrinsic to MLLMs. Moreover, we analyze and summarize existing scholarly discourses on MLLMs' attack and defense mechanisms, culminating in suggestions for the future research on MLLM security. Through this comprehensive analysis, we aim to deepen the academic understanding of MLLM security challenges and propel forward the development of trustworthy MLLM systems.

What problem does this paper attempt to address?

The paper primarily explores the challenges and risks associated with the security of Multimodal Large Language Models (MLLMs), particularly the new threats introduced when these models handle image inputs. The authors first outline the basic structure and training process of MLLMs and then build a security threat model for MLLMs based on this foundation. The main objectives of the paper include: 1. **Elucidating the multiple risks introduced by the image modality**: Since images carry rich semantic information and have continuous mathematical properties, they enhance MLLM functionality while also introducing new security risks. 2. **Constructing a threat model**: Detailed description of potential vulnerabilities, attack scenarios, and attack targets within MLLMs. 3. **Reviewing existing research**: A comprehensive review of current research on MLLM attacks and defense mechanisms. 4. **Proposing future research directions**: Based on the above analysis, suggestions for future research in the field of MLLM security are provided. Specifically, the paper covers the following key points: - **Basic architecture and training process**: Explanation of the five main components of MLLMs (modality encoders, input projectors, LLM backbone, output projectors, and modality generators), as well as the two main training processes (multimodal pre-training and multimodal instruction fine-tuning). - **Threat model**: Discussion of various vulnerabilities in MLLMs (such as training data poisoning, complexity of multimodal inputs), attack scenarios (white-box, black-box, gray-box attacks), and attack targets (such as cognitive biases, specific string outputs, jailbreaks, etc.). - **Attack methods**: Introduction of several major types of attacks, including structured attacks, adversarial perturbation attacks, and data poisoning attacks. - **Defense measures**: Summary of the two main branches of existing defenses—training phase defenses and inference phase defenses. Through the above content, the paper aims to deepen the academic community's understanding of the security challenges of MLLMs and promote the development of trustworthy MLLM systems.

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Safety of Multimodal Large Language Models on Images and Texts

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

Query-Relevant Images Jailbreak Large Multi-Modal Models

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Recent Advances in Attack and Defense Approaches of Large Language Models

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models

Large language models in 6G security: challenges and opportunities

Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Exploring Advanced Methodologies in Security Evaluation for LLMs

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models

Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse