IMF: Interactive Multimodal Fusion Model for Link Prediction

Xinhang Li,Xiangyu Zhao,Jiaxing Xu,Yong Zhang,Chunxiao Xing
DOI: https://doi.org/10.1145/3543507.3583554
2023-03-20
Abstract:Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at <a class="link-external link-https" href="https://github.com/HestiaSky/IMF-Pytorch" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the link prediction problem in Knowledge Graphs (KG). Specifically, the paper proposes an **Interactive Multimodal Fusion Model (IMF)** to integrate information from different modalities to improve the accuracy of link prediction. #### Main Problems and Challenges 1. **Link Prediction Problem**: In Knowledge Graphs, due to the complexity, diversity, and variability of knowledge, there are a large number of missing triples. Traditional link prediction methods (such as translation-based methods and neural network methods) are limited in effectiveness due to structural bias issues. 2. **Insufficient Utilization of Multimodal Information**: Although some existing studies have introduced multimodal information (such as structural information, visual information, and textual information), these methods usually project all modal information into a unified space, ignoring the complex interactions between different modalities, thus failing to effectively capture complementary information. #### Solution IMF addresses the above problems through the following approaches: 1. **Two-Stage Fusion Framework**: First, specific features of each modality are extracted separately, and then the complex interactions between different modalities are modeled through bilinear fusion and contrastive learning. 2. **Decision Fusion Module**: In the final decision fusion stage, the prediction results of different modalities are integrated, and complementary information is utilized for the final prediction, thereby improving prediction accuracy. In this way, IMF can more comprehensively utilize multimodal information, better capture the commonality and complementarity between modalities, and significantly enhance the effectiveness of link prediction.