PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition

Xuwu Wang,Junfeng Tian,Min Gui,Zhixu Li,Jiabo Ye,Ming Yan,Yanghua Xiao
DOI: https://doi.org/10.1007/978-3-031-00129-1_24
2022-01-01
Abstract:Multimodal named entity recognition (MNER) is an emerging task that incorporates visual and textual inputs to detect named entities and predicts their corresponding entity types. However, existing MNER methods often fail to capture certain entity-related but text-loosely-related visual clues from the image, which may introduce task-irrelevant noises or even errors. To address this problem, we propose to utilize entity-related prompts for extracting proper visual clues with a pre-trained vision-language model. To better integrate different modalities and address the popular semantic gap problem, we further propose a modality-aware attention mechanism for better cross-modal fusion. Experimental results on two benchmarks show that our MNER approach outperforms the state-of-the-art MNER approaches with a large margin.
What problem does this paper attempt to address?