Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media

Weixing Mai,Zhengxuan Zhang,Kuntao Li,Yun Xue,Fenghuan Li
DOI: https://doi.org/10.1109/tcss.2023.3303027
2024-01-01
IEEE Transactions on Computational Social Systems
Abstract:Multimodal named entity recognition (MNER) aims to detect named entities and identify the entity types based on texts and attached images, which also generates inputs for other comprehensive tasks, such as multimodal machine translation, visual dialog, and multimodal sentiment analysis. Existing studies have limitations in text-image matching and multimodal semantic disparity reduction. For one thing, current methods fail to resolve both overall and local text-image matching issues in a self-guided way. For another, the static graphs constructed in MNER models are challenging in bridging the semantic gap between different modalities. In this work, a dynamic graph construction framework (DGCF) is proposed to solve the above-mentioned limitations. A similarity vector-based text-image matching inferring strategy is designed to obtain the overall and local matching relation between text and image while the overall matching determines the retained proportion of visual information. Then, a multimodal dynamic graph interaction module is developed. Within each layer of the module, the local matching relations and part of speech (POS)-based multihead attention are integrated to construct a dynamic cross-modal graph and a semantic graph. Lastly, a CRF layer is used to predict entity label. Extensive experiments are performed on two benchmark datasets. The experimental results reveal that our model is a competitive alternative and achieves state-of-the-art performance.
What problem does this paper attempt to address?