A Token-wise Graph-based Framework for Multimodal Named Entity Recognition

Zhengxuan Zhang,Weixing Mai,Haoliang Xiong,Chuhan Wu,Yun Xue
DOI: https://doi.org/10.1109/ICME55011.2023.00368
2023-01-01
Abstract:Multimodal Named Entity Recognition (MNER) on social media posts is a leading but challenging task. However, most existing MNER methods fail to effectively exploit the visual information from the image. Besides, the multimodal interaction and alignment remains unsettled. In this paper, we propose a novel token-wise graph-based framework to deal with the MNER task. Specifically, a token-wise image processing manner is established. A muti-modal graph is constructed based on the textual token derived from BERT and the visual token derived from SwinT. Then, the muti-modal graph is fed into a multi-layer Transformer-based module for intra- and inter-modal information fusion. In addition, multiple contrastive learning is devised to perform the global and local alignment between textual and visual nodes. Experimental results on two benchmark multimodal datasets indicate that our model achieves state-of-the-art performance in MNER tasks.
What problem does this paper attempt to address?