MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

Jiaqi Li,Miaozeng Du,Chuanyi Zhang,Yongrui Chen,Nan Hu,Guilin Qi,Haiyun Jiang,Siyuan Cheng,Bozhong Tian
DOI: https://doi.org/10.18653/v1/2024.findings-acl.298
2024-01-01
Abstract:Multimodal knowledge editing represents a critical advancement in enhancingthe capabilities of Multimodal Large Language Models (MLLMs). Despite itspotential, current benchmarks predominantly focus on coarse-grained knowledge,leaving the intricacies of fine-grained (FG) multimodal entity knowledgelargely unexplored. This gap presents a notable challenge, as FG entityrecognition is pivotal for the practical deployment and effectiveness of MLLMsin diverse real-world scenarios. To bridge this gap, we introduce MIKE, acomprehensive benchmark and dataset specifically designed for the FG multimodalentity knowledge editing. MIKE encompasses a suite of tasks tailored to assessdifferent perspectives, including Vanilla Name Answering, Entity-Level Caption,and Complex-Scenario Recognition. In addition, a new form of knowledge editing,Multi-step Editing, is introduced to evaluate the editing efficiency. Throughour extensive evaluations, we demonstrate that the current state-of-the-artmethods face significant challenges in tackling our proposed benchmark,underscoring the complexity of FG knowledge editing in MLLMs. Our findingsspotlight the urgent need for novel approaches in this domain, setting a clearagenda for future research and development efforts within the community.
What problem does this paper attempt to address?