Abstract:Security concerns related to Large Language Models (LLMs) have been extensively explored, yet the safety implications for Multimodal Large Language Models (MLLMs), particularly in medical contexts (MedMLLMs), remain insufficiently studied. This paper delves into the underexplored security vulnerabilities of MedMLLMs, especially when deployed in clinical environments where the accuracy and relevance of question-and-answer interactions are critically tested against complex medical challenges. By combining existing clinical medical data with atypical natural phenomena, we define the mismatched malicious attack (2M-attack) and introduce its optimized version, known as the optimized mismatched malicious attack (O2M-attack or 2M-optimization). Using the voluminous 3MAD dataset that we construct, which covers a wide range of medical image modalities and harmful medical scenarios, we conduct a comprehensive analysis and propose the MCM optimization method, which significantly enhances the attack success rate on MedMLLMs. Evaluations with this dataset and attack methods, including white-box attacks on LLaVA-Med and transfer attacks (black-box) on four other SOTA models, indicate that even MedMLLMs designed with enhanced security features remain vulnerable to security breaches. Our work underscores the urgent need for a concerted effort to implement robust security measures and enhance the safety and efficacy of open-source MedMLLMs, particularly given the potential severity of jailbreak attacks and other malicious or clinically significant exploits in medical settings. Our code is available at <a class="link-external link-https" href="https://github.com/dirtycomputer/O2M_attack" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to explore and reveal the security vulnerabilities of medical multimodal large language models (MedMLLMs) when deployed in clinical settings, especially the accuracy and relevance issues of these models when dealing with complex medical challenges. Specifically, the paper mainly focuses on the following aspects: 1. **Define new attack methods**: - The paper introduces two new attack methods: **Mismatched Malicious Attack (2M - attack)** and **Optimized Mismatched Malicious Attack (O2M - attack or 2M - optimization)**. These two attack methods simulate errors and malicious behaviors in clinical environments by combining existing clinical medical data with abnormal natural phenomena. 2. **Construct a comprehensive medical security dataset**: - In order to evaluate the vulnerability of MedMLLMs, the authors construct a multimodal medical model attack dataset named **3MAD**. This dataset covers a wide range of medical image modalities and harmful medical scenarios, providing diverse data and evaluation metrics to test the security and semantic alignment ability of MedMLLMs when facing malicious requests. 3. **Propose a multimodal cross - optimization method (MCM)**: - The authors propose a multimodal cross - optimization method (Multimodal Cross - Optimization Methodology, MCM) for optimizing jailbreak attacks against MedMLLMs. The MCM method processes text and image data simultaneously and dynamically selects optimization targets according to performance, significantly improving the success rate of attacks. 4. **Emphasize the importance of security measures**: - The research results show that even MedMLLMs designed with enhanced security features are still vulnerable to security vulnerabilities. Therefore, the paper calls for strong measures to improve the security and effectiveness of open - source MedMLLMs, especially in jailbreak attacks and other malicious or clinically important exploitation behaviors that may occur in medical environments. Through these efforts, the paper reveals the potential risks of MedMLLMs in clinical applications and provides important references and guidance for future research and development. ### Formulas involved - **Image optimization**: \[ \tilde{g}=\text{Clip}_{g,\epsilon}(g + \alpha\cdot\text{sign}(-\nabla_g L(q, g, x_{1:n}))) \] where $\alpha$ is the step size and $\nabla_g L$ is the gradient of the loss function with respect to the image $g$. The image is perturbed in the negative gradient direction and clipped to ensure that the perturbation is within the limit $\epsilon$. - **Text optimization**: \[ X_i:=\text{Top - k}(-\nabla_{e_x} L(q, g, x_{1:n})) \] For each position $i$, the top - $k$ tokens are selected according to the gradient magnitude. - **Cross - modal evaluation**: \[ (g, x_{1:n})=\begin{cases} (\tilde{g}, x_{1:n})&\text{if }L(q, \tilde{g}, x_{1:n})<\min L(q, g, \tilde{x}^{(b)}_{1:n})\\ (g, \tilde{x}^{(b^*)}_{1:n})&\text{else, where }b^*=\arg\min_b L(q, g, \tilde{x}^{(b)}_{1:n}) \end{cases} \] - **Attack Success Rate (ASR)** and **Rejection Rate (RR)**: \[ ASR(A)=\frac{1}{|A|}\sum_{a\in A}\text{Success}(a) \]

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

Query-Relevant Images Jailbreak Large Multi-Modal Models

Adversarial Attacks on Large Language Models in Medicine

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

Safety of Multimodal Large Language Models on Images and Texts

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models

Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer

Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models

$\textit{MMJ-Bench}$: A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security

Model-Editing-Based Jailbreak against Safety-aligned Large Language Models

SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models

PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach