Abstract:Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.

Noise-powered Multi-modal Knowledge Graph Representation Framework

The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework.

Multi-modal knowledge graphs representation learning via multi-headed self-attention

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

NativE: Multi-modal Knowledge Graph Completion in the Wild

NeuralKG: an Open Source Library for Diverse Representation Learning of Knowledge Graphs

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Building Multimodal Knowledge Bases with Multimodal Computational Sequences and Generative Adversarial Networks

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion

MMKRL: A robust embedding approach for multi-modal knowledge graph representation learning

Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion

Multi-Modal Siamese Network for Few-Shot Knowledge Graph Completion

Contrastive Multi-modal Knowledge Graph Representation Learning

MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning

Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion

Multi-modal Graph Convolutional Network for Knowledge Graph Entity Alignment

Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition

Knowledge Graph Completion with Pre-trained Multimodal Transformer and Twins Negative Sampling

MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion