Multimodal Named Entity Recognition Model Based on Cross-modal Feature Enhancement Mechanism

Yijia Zhang,Xiaoying Zhou,Jianyuan Yuan,Zhuang Wang,YiLin Pan
DOI: https://doi.org/10.1109/clnlp64123.2024.00015
2024-01-01
Abstract:Multimodal named entity recognition (MNER) for social media is challenging. In recent research, most methods help the model learn more Semantic information by adding data samples, for example, replacing entities with synonyms to generate different sample data. Since synonym replacement is based on word level rather than context, this method may alter the contextual semantic relationships of entities. Therefore, this paper proposes an MNER model with a cross-modal feature enhancement mechanism. Specifically, we introduced a contrastive language-image CLIP(contrastive language image pre-training model) model to enhance the ability to extract useful information and enrich multimodal feature representations. Secondly, we introduce a hierarchical attention framework to conduct the interaction between feature elements of different modalities. Applying our method to two popular Twitter datasets has yielded state-of-the-art results, notably surpassing those of other proposed models.
What problem does this paper attempt to address?