Multimodal Representation with Embedded Visual Guiding Objects for Named Entity Recognition in Social Media Posts.

Zhiwei Wu,Changmeng Zheng,Yi Cai,Junying Chen,Ho-fung Leung,Qing Li
DOI: https://doi.org/10.1145/3394171.3413650
2020-01-01
Abstract:Visual contexts often help to recognize named entities more precisely in short texts such as tweets or snapchat. For example, one can identify "Charlie'' as a name of a dog according to the user posts. Previous works on multimodal named entity recognition ignore the corresponding relations of visual objects and entities. Visual objects are considered as fine-grained image representations. For a sentence with multiple entity types, objects of the relevant image can be utilized to capture different entity information. In this paper, we propose a neural network which combines object-level image information and character-level text information to predict entities. Vision and language are bridged by leveraging object labels as embeddings, and a dense co-attention mechanism is introduced for fine-grained interactions. Experimental results in Twitter dataset demonstrate that our method outperforms the state-of-the-art methods.
What problem does this paper attempt to address?