Can We Edit Multimodal Large Language Models?

Siyuan Cheng,Bozhong Tian,Qingbin Liu,Xi Chen,Yongheng Wang,Huajun Chen,Ningyu Zhang
2024-04-18
Abstract:In this paper, we focus on editing Multimodal Large Language Models (MLLMs). Compared to editing single-modal LLMs, multimodal model editing is more challenging, which demands a higher level of scrutiny and careful consideration in the editing process. To facilitate research in this area, we construct a new benchmark, dubbed MMEdit, for editing multimodal LLMs and establishing a suite of innovative metrics for evaluation. We conduct comprehensive experiments involving various model editing baselines and analyze the impact of editing different components for multimodal LLMs. Empirically, we notice that previous baselines can implement editing multimodal LLMs to some extent, but the effect is still barely satisfactory, indicating the potential difficulty of this task. We hope that our work can provide the NLP community with insights. Code and dataset are available in <a class="link-external link-https" href="https://github.com/zjunlp/EasyEdit" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning,Multimedia
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily focuses on the editing issues of Multimodal Large Language Models (MLLMs). Compared to unimodal language models, editing multimodal models is more challenging because they need to handle information from different modalities (such as text and images), and erroneous outputs may result from the synergistic effects of multiple modalities. Specifically, the paper attempts to address the following key issues: 1. **Reliability of Multimodal Model Editing**: How to ensure that after editing, the model can accurately update its understanding of specific inputs while maintaining consistent interpretation for other unrelated inputs. 2. **Locality of Multimodal Model Editing**: How to minimize unintended side effects on the overall knowledge base of the model during the editing process, ensuring the model's stability and consistency. 3. **Generality of Multimodal Model Editing**: How to make the edited model not only correct specific erroneous inputs but also produce consistent correct outputs when encountering similar inputs. To investigate these issues, the authors constructed a new benchmark dataset **MMEdit** and proposed a set of innovative evaluation metrics to assess the effectiveness of multimodal model editing. Through comprehensive experiments, the authors analyzed the impact of different editing methods on various components of multimodal models, finding that existing editing methods perform well on text modules but are still unsatisfactory on visual modules. This indicates the potential difficulty of the multimodal model editing task and opportunities for future research.