MAFN: Multi-Level Attention Fusion Network for Multimodal Named Entity Recognition

Xiaoying Zhou,Yijia Zhang,Zhuang Wang,Mingyu Lu,Xiaoxia Liu
DOI: https://doi.org/10.1007/s11042-023-17376-5
IF: 2.577
2023-01-01
Multimedia Tools and Applications
Abstract:Multimodal named entity recognition (MNER) aims to use the modality information of images and text to identify named entities from free text and classify them into predefined types, such as Person, Location, Organization, etc. However, most existing MNER methods adopt simple splicing and attention mechanisms and fail to fully utilize the modal information to capture the intra-modal and inter-modal interactions. This simple fusion operation may bring bias to the prediction results of named entities. In this paper, we propose a novel Multi-level Attention Fusion Network (MAFN) to deal with this problem. Specifically, This paper introduce a multi-level attention mechanism to learn intra-modal and inter-modal interactions to obtain multimodal representations for each word. Furthermore, we introduce a visual filter gate to remove words that cannot be aligned with any visual block to control the contribution of visual features dynamically. Experimental results on two publicly available Twitter datasets demonstrate that our method outperforms other state-of-the-art baseline methods.
What problem does this paper attempt to address?