Abstract:Named Entity Recognition (NER) on social media refers to discovering and classifying entities from unstructured free-form content, and it plays an important role for various applications such as intention understanding and user recommendation. With social media posts tending to be multimodal, Multimodal Named Entity Recognition (MNER) for the text with its accompanying image is attracting more and more attention since some textual components can only be understood in combination with visual information. However, there are two drawbacks in existing approaches: 1) Meanings of the text and its accompanying image do not match always, so the text information still plays a major role. However, social media posts are usually shorter and more informal compared with other normal contents, which easily causes incomplete semantic description and the data sparsity problem. 2) Although the visual representations of whole images or objects are already used, existing methods ignore either fine-grained semantic correspondence between objects in images and words in text or the objective fact that there are misleading objects or no objects in some images. In this work, we solve the above two problems by introducing the multi-granularity cross-modality representation learning. To resolve the first problem, we enhance the representation by semantic augmentation for each word in text. As for the second issue, we perform the cross-modality semantic interaction between text and vision at the different vision granularity to get the most effective multimodal guidance representation for every word. Experiments show that our proposed approach can achieve the SOTA or approximate SOTA performance on two benchmark datasets of tweets. The code, data and the best performing models are available at https://github.com/LiuPeiP-CS/IIE4MNER

Multimodal Named Entity Recognition Model Based on Cross-modal Feature Enhancement Mechanism

Multi-granularity cross-modal representation learning for named entity recognition on social media

Multi-Granularity Cross-Modality Representation Learning for Named Entity Recognition on Social Media

On development of multimodal named entity recognition using part-of-speech and mixture of experts

GNN-Based Multimodal Named Entity Recognition

CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention

Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition

mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view Contrastive Learning

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

MAF - A General Matching and Alignment Framework for Multimodal Named Entity Recognition.

In vitro activity of scorpiand-like azamacrocycle derivatives in promastigotes and intracellular amastigotes of Leishmania infantum and Leishmania braziliensis.

CMNER: A Chinese Multimodal NER Dataset based on Social Media

PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition

Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance

CSMA-CNER:Multi-modal Chinese NER Task with Cross- and Self-Modality Attention

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition

MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition

Object-Aware Multimodal Named Entity Recognition in Social Media Posts With Adversarial Learning