Abstract:The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future <a class="link-external link-http" href="http://research.The" rel="external noopener nofollow">this http URL</a> open-source repository corresponding to this work can be found at <a class="link-external link-https" href="https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problems of jailbreak attacks in multimodal generation models and their defense mechanisms. Specifically: 1. **Background**: - The rapid development of multimodal base models has made significant progress in cross - modal understanding and generation, covering multiple modalities such as text, image, audio, and video. - However, these models are still vulnerable to jailbreak attacks, which can bypass the built - in security mechanisms and induce the generation of potentially harmful content. 2. **Problems**: - **Understanding attack methods**: There is a need to systematically understand the methods of jailbreak attacks in multimodal generation models. - **Existing defense mechanisms**: It is necessary to evaluate the existing defense mechanisms to ensure the safe deployment of these models in practical applications, especially in security - sensitive application scenarios. - **Unified framework**: Existing reviews mainly focus on the output content of specific modalities and lack a unified framework covering multiple modalities (text, image, audio, video). 3. **Goals**: - **Providing comprehensive insights**: By systematically reviewing jailbreak attacks and defense strategies, provide comprehensive insights into this topic. - **Classification and summarization**: Systematically explore the attacks and corresponding defense strategies at four levels (input, encoder, generator, output) according to the life cycle of multimodal jailbreak. - **Detailed classification**: Propose a detailed classification of specific attack methods, defense mechanisms, and evaluation frameworks for multimodal generation models. - **Covering multiple configurations**: Cover a wide range of input - output configurations, including Any - to - Text, Any - to - Vision, and Any - to - Any generation systems. - **Research challenges and future directions**: Highlight the current research challenges and propose potential directions for future research. ### Specific contributions 1. **Comprehensive review**: - By comprehensively reviewing the existing attack methods and defense strategies, an abstract and general classification system covering four different stages (see Figure 2) is summarized. 2. **Systematic review**: - Provide a comprehensive and systematic review of attack, defense, and evaluation strategies for various input - output modalities and different model structures. 3. **Discussion and future directions**: - In - depth discussion of the limitations, challenges, and future directions in practical applications, providing directions for future research. Through these contributions, this paper aims to enhance the understanding of researchers, practitioners, and policymakers of the security challenges of multimodal generation models and provide guidance for the development of effective defense measures.

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models

Multimodal Pragmatic Jailbreak on Text-to-image Models

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

$\textit{MMJ-Bench}$: A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Jailbreaking Attack against Multimodal Large Language Model

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

White-box Multimodal Jailbreaks Against Large Vision-Language Models

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves

Gradient-based Jailbreak Images for Multimodal Fusion Models

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs