On development of multimodal named entity recognition using part-of-speech and mixture of experts

Jianying Chen,Yun Xue,Haolan Zhang,Weiping Ding,Zhengxuan Zhang,Jiehai Chen
DOI: https://doi.org/10.1007/s13042-022-01754-w
2022-12-25
International Journal of Machine Learning and Cybernetics
Abstract:Multimodal Named Entity Recognition (MNER) is a fundamental task in the field of natural language processing for social media posts. Current MNER models fail to deal with the relation between text and image entities, which results in the textual noise, image noise and even multimodal noise during processing. In this paper, we first introduce the Part-of-speech (POS) information, which is used for non-entity words eliminating and textual noise filtering. A POS-base gated cross-modal attention network is established to precisely learn the textual and visual representations to remove the image noise. Then, a Mixture-of-Experts (MOE) is proposed for multimodality integration, which optimize the effectiveness of named entity identification and filter the multimodal noise. We evaluate the proposed model on the Twitter dataset and the experimental results establish a strong evidence of the state-of-the-art performance.
computer science, artificial intelligence
What problem does this paper attempt to address?