Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning.

Peng Wang,Xiaohang Chen,Ziyu Shang,Wenjun Ke
DOI: https://doi.org/10.1587/transinf.2022edp7116
2023-01-01
IEICE Transactions on Information and Systems
Abstract:Multimodal named entity recognition (MNER) is the task of recognizing named entities in multimodal context. Existing methods focus on utilizing co-attention mechanism to discover the relationships between multiple modalities. However, they still have two deficiencies: First, current methods fail to fuse the multimodal representations in a fine-grained way, which may bring noise of visual modalities. Second, current methods ignore bridging the semantic gap between heterogeneous modali-ties. To solve the above issues, we propose a novel MNER method with bot-tleneck fusion and contrastive learning (BFCL). Specifically, we first incor-porate the transformer-based bottleneck fusion mechanism, subsequently, information between different modalities can only be exchanged through several bottleneck tokens, thus reducing the noise propagation. Then we propose two decoupled image-text contrastive losses to align the unimodal representations, making the representations of semantically similar modal-ities closer, while the representations of semantically different modalities farther away. Experimental results demonstrate that our method is com-petitive to the state-of-the-art models, and achieves 74.54% and 85.70% F1-scores on Twitter-2015 and Twitter-2017 datasets, respectively.
What problem does this paper attempt to address?