CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention

Xuwu Wang,Jiabo Ye,Zhixu Li,Junfeng Tian,Yong Jiang,Ming Yan,Ji Zhang,Yanghua Xiao
DOI: https://doi.org/10.1109/icme52920.2022.9859972
2022-01-01
Abstract:Multimodal named entity recognition (MNER) aims to detect and classify named entities in multimodal scenarios. It requires bridging the gap between natural language and visual context, which presents two-fold challenges: the cross-modal alignment is diversified, and the cross-modal interaction is sometimes implicit. Existing MNER methods are vulnerable to some implicit interactions and are prone to overlook the involved significant features. To tackle this problem, we novelly propose to refine the cross-modal attention by identifying and highlighting some task-salient features. The saliency of each feature is measured according to its correlation with the expanded entity label words derived from external knowledge bases. We further propose an end-to-end Transformer-based MNER framework, which holds neater architecture yet achieves better performance than previous methods. Extensive experiments are conducted to validate the merits of our method. Moreover, our method reveals a significant advantage in data efficiency and generalization ability.
What problem does this paper attempt to address?