Can We Edit Multimodal Large Language Models?

Siyuan Cheng,Bozhong Tian,Qingbin Liu,Xi Chen,Yongheng Wang,Huajun Chen,Ningyu Zhang

2024-04-18

Abstract:In this paper, we focus on editing Multimodal Large Language Models (MLLMs). Compared to editing single-modal LLMs, multimodal model editing is more challenging, which demands a higher level of scrutiny and careful consideration in the editing process. To facilitate research in this area, we construct a new benchmark, dubbed MMEdit, for editing multimodal LLMs and establishing a suite of innovative metrics for evaluation. We conduct comprehensive experiments involving various model editing baselines and analyze the impact of editing different components for multimodal LLMs. Empirically, we notice that previous baselines can implement editing multimodal LLMs to some extent, but the effect is still barely satisfactory, indicating the potential difficulty of this task. We hope that our work can provide the NLP community with insights. Code and dataset are available in <a class="link-external link-https" href="https://github.com/zjunlp/EasyEdit" rel="external noopener nofollow">this https URL</a>.

Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning,Multimedia

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily focuses on the editing issues of Multimodal Large Language Models (MLLMs). Compared to unimodal language models, editing multimodal models is more challenging because they need to handle information from different modalities (such as text and images), and erroneous outputs may result from the synergistic effects of multiple modalities. Specifically, the paper attempts to address the following key issues: 1. **Reliability of Multimodal Model Editing**: How to ensure that after editing, the model can accurately update its understanding of specific inputs while maintaining consistent interpretation for other unrelated inputs. 2. **Locality of Multimodal Model Editing**: How to minimize unintended side effects on the overall knowledge base of the model during the editing process, ensuring the model's stability and consistency. 3. **Generality of Multimodal Model Editing**: How to make the edited model not only correct specific erroneous inputs but also produce consistent correct outputs when encountering similar inputs. To investigate these issues, the authors constructed a new benchmark dataset **MMEdit** and proposed a set of innovative evaluation metrics to assess the effectiveness of multimodal model editing. Through comprehensive experiments, the authors analyzed the impact of different editing methods on various components of multimodal models, finding that existing editing methods perform well on text modules but are still unsatisfactory on visual modules. This indicates the potential difficulty of the multimodal model editing task and opportunities for future research.

Can We Edit Multimodal Large Language Models?

Editing Large Language Models: Problems, Methods, and Opportunities

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Language Anisotropic Cross-Lingual Model Editing

Editing Conceptual Knowledge for Large Language Models

Is it Possible to Edit Large Language Models Robustly?

On the Robustness of Editing Large Language Models

Robust and Scalable Model Editing for Large Language Models

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Model Composition for Multimodal Large Language Models

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

MULFE: A Multi-Level Benchmark for Free Text Model Editing

Cross-Lingual Knowledge Editing in Large Language Models

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

A Survey on Benchmarks of Multimodal Large Language Models

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks