Abstract:Relational learning is an essential task in the domain of knowledge representation, particularly in knowledge graph completion (KGC).While relational learning in traditional single-modal settings has been extensively studied, exploring it within a multimodal KGC context presents distinct challenges and opportunities. One of the major challenges is inference on newly discovered relations without any associated training data. This zero-shot relational learning scenario poses unique requirements for multimodal KGC, i.e., utilizing multimodality to facilitate relational learning. However, existing works fail to support the leverage of multimodal information and leave the problem unexplored. In this paper, we propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator, and relation embedding generator, to integrate diverse multimodal information and knowledge graph structures to facilitate the zero-shot relational learning. Evaluation results on two multimodal knowledge graphs demonstrate the superior performance of our proposed method.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **Zero - Shot Relational Learning in Multimodal Knowledge Graphs**. Specifically, the author focuses on how to infer newly - emerging relationships (i.e., zero - shot relationships) in the absence of relevant training data. This problem is particularly important in multimodal knowledge graphs (MMKGs) because these graphs contain rich multimodal information (such as image, text, and structural information) that can be used to help infer new relationships.
### Background and Challenges of the Problem
1. **Long - Tail Distribution Problem**: In multimodal knowledge graphs, the distribution of relationships is often a long - tail distribution, that is, a small number of relationships have a large number of entity - pair samples, while most relationships have scarce or even no samples at all. This unbalanced distribution makes it difficult for existing methods to accurately learn the representations of these relationships.
2. **Zero - Shot Scenario**: When new relationships emerge, due to the lack of training samples, traditional relationship reasoning methods cannot effectively handle these new relationships. For example, in a movie - related knowledge graph, newly - emerging relationships such as "costume designer" or "nominated work" may not have any known entity - pairs, which makes existing methods difficult to infer these new relationships.
3. **Utilization of Multimodal Information**: Although some works attempt to use multimodal information to improve relationship learning, they usually focus only on a single modality or fail to fully utilize the potential associations between multimodal information.
### Solutions in the Paper
To solve the above problems, the author proposes a new end - to - end framework named **MRE (Multimodal Relation Extrapolation)**, which consists of three main modules:
1. **Multimodal Learner**: This module is responsible for encoding multimodal information and modeling the potential associations between different modalities. In this way, it can capture more fine - grained semantic information between entities and relationships.
2. **Structure Consolidator**: This module incorporates the structural information of the knowledge graph into the multimodal fusion process, further optimizing the representation of multimodal information.
3. **Relation Embedding Generator**: This module generates new relation embeddings through the generative adversarial network (GAN) method, so that it can learn the representations of new relationships in zero - shot scenarios.
Through the synergy of these three modules, MRE can effectively infer new relationships without training samples, and the experimental results on two real - world multimodal knowledge graph datasets show that MRE outperforms existing methods.
### Summary
The main contribution of this paper is to propose for the first time a method of using multimodal information to solve zero - shot relationship learning in multimodal knowledge graphs. By introducing the multimodal learner, structure consolidator, and relation embedding generator, MRE can not only capture the potential associations between different modalities but also accurately infer new relationships in zero - shot scenarios.