IMF: Interactive Multimodal Fusion Model for Link Prediction

Xinhang Li,Xiangyu Zhao,Jiaxing Xu,Yong Zhang,Chunxiao Xing

DOI: https://doi.org/10.1145/3543507.3583554

2023-03-20

Abstract:Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at <a class="link-external link-https" href="https://github.com/HestiaSky/IMF-Pytorch" rel="external noopener nofollow">this https URL</a>.

Artificial Intelligence,Information Retrieval

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the link prediction problem in Knowledge Graphs (KG). Specifically, the paper proposes an **Interactive Multimodal Fusion Model (IMF)** to integrate information from different modalities to improve the accuracy of link prediction. #### Main Problems and Challenges 1. **Link Prediction Problem**: In Knowledge Graphs, due to the complexity, diversity, and variability of knowledge, there are a large number of missing triples. Traditional link prediction methods (such as translation-based methods and neural network methods) are limited in effectiveness due to structural bias issues. 2. **Insufficient Utilization of Multimodal Information**: Although some existing studies have introduced multimodal information (such as structural information, visual information, and textual information), these methods usually project all modal information into a unified space, ignoring the complex interactions between different modalities, thus failing to effectively capture complementary information. #### Solution IMF addresses the above problems through the following approaches: 1. **Two-Stage Fusion Framework**: First, specific features of each modality are extracted separately, and then the complex interactions between different modalities are modeled through bilinear fusion and contrastive learning. 2. **Decision Fusion Module**: In the final decision fusion stage, the prediction results of different modalities are integrated, and complementary information is utilized for the final prediction, thereby improving prediction accuracy. In this way, IMF can more comprehensively utilize multimodal information, better capture the commonality and complementarity between modalities, and significantly enhance the effectiveness of link prediction.

IMF: Interactive Multimodal Fusion Model for Link Prediction

Dual Low-Rank Multimodal Fusion

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

MM-Transformer: A Transformer-Based Knowledge Graph Link Prediction Model That Fuses Multimodal Features

Dense Multimodal Fusion for Hierarchically Joint Representation

MIMF: Mutual Information-Driven Multimodal Fusion

Progressive Fusion for Multimodal Integration

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attention

Multimodal Hyperspectral Image Classification via Interconnected Fusion

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

Attention-guided Multi-step Fusion: A Hierarchical Fusion Network for Multimodal Recommendation

Predictive Dynamic Fusion

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Multimodal Language Analysis with Recurrent Multistage Fusion

An Effective Multimodal Representation and Fusion Method for Multimodal Intent Recognition

Multi-Grained Multimodal Interaction Network for Entity Linking

Deep Equilibrium Multimodal Fusion