Abstract:Chinese Named Entity Recognition (NER) is a very important subtask in the information extraction domain. Its purpose is to locate named entities in the text and classify them into predetermined categories. The key point of NER is to learn a high-quality representation of tokens. Recently, representation learning techniques have been introduced into NER due to their excellent performance in mining the semantics of texts and mastering their organization. In the field of Chinese, many studies introduce multimodal feature extraction schemes to enrich token representations, such as radical and word. However, the learning scheme of these auxiliary features is relatively complex and has difficulty learning the interactive relationship between features with a concatenation or MLP fusion method. To address these challenges, a Multimodal Chinese NER Model based on Self-attention Mechanism named MNER is proposed, which consists of a multimodal feature fusion module and an entity classification module. To study the informative characterization of tokens, a multimodal feature fusion module is proposed to exploit radical, character, and word information. In the multimodal feature fusion module, a self-attention mechanism is designed to integrate multimodal features based on the correlation between the features, which addresses the problem that existing methods have difficulty exploiting interaction information between features. A semantic-aware category modifier is proposed to enhance the CRF classification layer's performance. It increases entity discrimination by adjusting the embeddings of features according to the similarity between the token embeddings and embedding of each entity category, which widens the encoding gap between different entities and narrows the search scope for entity classification. Finally, the proposed MNER is compared with ten state-of-the-art methods on Weibo and Resume datasets, and the results show the superiority of our model on three metrics.

P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition

MAFN: Multi-Level Attention Fusion Network for Multimodal Named Entity Recognition

MPMRC-MNER: A Unified MRC Framework for Multimodal Named Entity Recognition Based Multimodal Prompt

Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning.

Multimodal Named Entity Recognition Model Based on Cross-modal Feature Enhancement Mechanism

PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition

MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition

GNN-Based Multimodal Named Entity Recognition

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

On development of multimodal named entity recognition using part-of-speech and mixture of experts

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention

Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

Multimodal Features Enhanced Named Entity Recognition Based on Self-Attention Mechanism.

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

A Token-wise Graph-based Framework for Multimodal Named Entity Recognition

Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition

End-to-End Visual Grounding Framework for Multimodal NER in Social Media Posts

Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

Multi-granularity cross-modal representation learning for named entity recognition on social media