Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning
Ke Liang,Lingyuan Meng,Yue Liu,Meng Liu,Wei,Suyuan Liu,Wenxuan Tu,Siwei Wang,Sihang Zhou,Xinwang Liu
DOI: https://doi.org/10.1145/3664647.3681112
2024-01-01
Abstract:Various information in different modalities in an intuitive way in multi-modal knowledge graphs (MKGs), which are utilized in different downstream tasks, like recommendation. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial intelligence, pre-trained transformers have drawn increasing attention, especially in multi-modal scenarios. However, the research of multi-modal pre-trained transformers (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multi-modal data, the rich structural information underlying the MKG is still not fully utilized in previous MPT. Most of them only use the graph structure as a retrieval map for matching images and texts connected with the same entity, which hinders their reasoning performances. To this end, the graph Structure Guided Multi-modal Pre-trained Transformer is proposed for knowledge graph reasoning (SGMPT). Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two simple yet effective strategies, i.e., weighted summation and alignment constraint, is designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT for multi-modal KGR, which mines structural information underlying MKGs. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and proves the effectiveness of the designed strategies.