Safety of Multimodal Large Language Models on Images and Texts

Xin Liu,Yichen Zhu,Yunshi Lan,Chao Yang,Yu Qiao
2024-06-20
Abstract:Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions. The latest papers are continually collected at <a class="link-external link-https" href="https://github.com/isXinLiu/MLLM-Safety-Collection" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on the security issues of Multi-modal Large Language Models (MLLMs) in terms of images and texts. With the widespread application of MLLMs in improving work efficiency, they also bring significant security risks due to their vulnerability to unsafe instructions. Researchers have done a lot of work on the security of single-modal language models, but the research on the security of MLLMs is still in its infancy. The paper systematically investigates methods for evaluating, attacking, and defending the security of MLLMs. First, it introduces an overview of MLLMs on images and texts, as well as the understanding of security. Then, it reviews evaluation datasets and metrics used to measure model security. Afterwards, it elaborates on the attack and defense techniques related to MLLM security. Finally, it analyzes the existing unresolved issues and discusses future research directions. The risks mentioned in the paper mainly include three aspects: the adversarial perturbations of images can induce insecure results at a low cost; alignment-based LLMs usually reject malicious textual instructions, but they may directly follow corresponding visual instructions when utilizing built-in Optical Character Recognition (OCR) capabilities; cross-modal training can weaken alignment ability. To promote progress in this field, the paper provides a comprehensive summary of MLLM security, including evaluation, attack, and defense perspectives. In summary, this paper aims to address how to ensure the secure behavior of Multi-modal Large Language Models when handling image and text inputs, as well as how to evaluate, prevent, and cope with potential unsafe factors.