Entity Linking Supported Multimodal Data: Fusing Text and Image features for Higher Accuracy

Xing Zhao,Yuke Chen,Yang Dai,Peng Wu
DOI: https://doi.org/10.1109/ICICML60161.2023.10424744
2023-11-03
Abstract:To address the problem of the low accuracy of traditional text-only data entity linking methods, this paper proposes a new multimodal entity-linking model that leverages the richness and complementarity of multi-modal information by effectively integrating text and image characteristics to enhance the accuracy of entity linking. The proposed method uses Bert model and the CNN-RNN model to stratify the image and text characteristics that contain the references, respectively; then the mechanism of co-attention and the method of gate fusion are imported to learn the correlation between text and images automatically; and the weight and importance of the characteristics are adjusted to achieve accurate alignment and interaction between the text and the images. Finally, the cosine similarity are used to measure the similarities between the candidate entities and mentions. Furthermore, experimental research is proposed with the RMEL and WMEL Multimodal Entity Linking Dataset.The results show that the proposed method outperforms other entity-linked models.
Computer Science
What problem does this paper attempt to address?