Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng,Leijiang Gu,Xun Yang,Zhangling Duan,Zenglin Shi,Meng Wang

2024-11-19

Abstract:Knowledge editing aims to efficiently and cost-effectively correct inaccuracies and update outdated information. Recently, there has been growing interest in extending knowledge editing from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs), which integrate both textual and visual information, introducing additional editing complexities. Existing multimodal knowledge editing works primarily focus on text-oriented, coarse-grained scenarios, failing to address the unique challenges posed by multimodal contexts. In this paper, we propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities. We introduce the Fine-Grained Visual Knowledge Editing (FGVEdit) benchmark to evaluate this task. Moreover, we propose a Multimodal Scope Classifier-based Knowledge Editor (MSCKE) framework. MSCKE leverages a multimodal scope classifier that integrates both visual and textual information to accurately identify and update knowledge related to specific entities within images. This approach ensures precise editing while preserving irrelevant information, overcoming the limitations of traditional text-only editing methods. Extensive experiments on the FGVEdit benchmark demonstrate that MSCKE outperforms existing methods, showcasing its effectiveness in solving the complex challenges of multimodal knowledge editing.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of performing fine - grained visual knowledge editing in Multimodal Large Language Models (MLLMs). Specifically, the existing multimodal knowledge editing methods mainly focus on text - oriented, coarse - grained scenarios and cannot effectively handle the complex editing requirements brought by multiple interacting entities in images. Therefore, the paper proposes a visual - oriented fine - grained multimodal knowledge editing task and introduces the Fine - Grained Visual Knowledge Editing (FGVEdit) benchmark to evaluate this task. The main contributions of the paper include: 1. **Introducing a new visual - oriented fine - grained knowledge editing task**: Particularly emphasizing the unique challenges and characteristics of knowledge editing in a multimodal setting, which are different from traditional text - based methods. 2. **Proposing the Multimodal Scope Classifier - based Knowledge Editor (MSCKE) framework**: Specifically designed for fine - grained knowledge editing in multimodal large language models, it accurately identifies and updates knowledge related to specific entities by combining visual and text information. 3. **Introducing the FGVEdit benchmark**: It contains images with multiple entities and aims to evaluate the fine - grained visual editing capabilities of multimodal knowledge editing methods, pushing the limits of editing accuracy and relevance. Through extensive experiments on the FGVEdit benchmark, the paper demonstrates the effectiveness of the MSCKE framework in solving the complex challenges of multimodal knowledge editing, significantly outperforming existing methods.

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing

KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models

A Comprehensive Study of Knowledge Editing for Large Language Models

Editing Conceptual Knowledge for Large Language Models

MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

ConceptEdit: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning

InstructEdit: Instruction-based Knowledge Editing for Large Language Models

Multilingual Knowledge Editing with Language-Agnostic Factual Neurons

Can We Edit Multimodal Large Language Models?

Knowledge Graph Enhanced Large Language Model Editing

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

Commonsense Knowledge Editing Based on Free-Text in LLMs

Cross-Lingual Knowledge Editing in Large Language Models