Entity Representation Learning with Multimodal Neighbors for Link Prediction in Knowledge Graph

Wenxuan Liu,Hao Duan,Zeng Li,Jingdong Liu,Hong Huo,Tao Fang
DOI: https://doi.org/10.1109/iccc54389.2021.9674496
2021-01-01
Abstract:Entity representation learning is both fundamental and crucial for link prediction in knowledge graph. Existing entity representation learning approaches mainly focus on learning from knowledge triples with single modal text data which could only describe the things or concepts in words. However, multimodal data commonly exists in the real worlds and may describe things or concepts from different aspects. Therefore, multimodal entities have emerged in knowledge graph in recent years, which greatly enrich the knowledge graph by different information. This paper focuses on how to learn more robust and comprehensive representations of multimodal entities for better link prediction. A new representation learning method for multimodal entities is proposed, called Graph Attention with Bi-pooling Representation Learning (GBRL), where multimodal fusion and knowledge embedding are combined for learning representations of an entity with consideration of its multimodal neighbors. Firstly, feature vectors are obtained by pre-trained models for each modality. Then, the bilinear pooling is utilized for combining feature vectors of multimodal entities into multimodal feature vectors. A two-layer graph attention module is followed to generate multimodal representations of an entity from its multimodal neighbors. Finally, the inception decoder uses multimodal knowledge embedding to calculate scores for link prediction. The experimental results have shown that the proposed method performs well on link prediction by consideration of multimodal entities comparing with traditional methods that using only single modal text data.
What problem does this paper attempt to address?