Vinh Van Tong,Thanh Trung Huynh,Thanh Tam Nguyen,Hongzhi Yin,Quoc Viet Hung Nguyen,Quyet Thang Huynh
Abstract:Knowledge graph (KG) alignment - the task of recognizing entities referring to the same thing in different KGs - is recognized as one of the most important operations in the field of KG construction and completion. However, existing alignment techniques often assume that the input KGs are complete and isomorphic, which is not true due to the real-world heterogeneity in the domain, size, and sparsity. In this work, we address the problem of aligning incomplete KGs with representation learning. Our KG embedding framework exploits two feature channels: transitivity-based and proximity-based. The former captures the consistency constraints between entities via translation paths, while the latter captures the neighbourhood structure of KGs via attention guided relation-aware graph neural network. The two feature channels are jointly learned to exchange important features between the input KGs while enforcing the output representations of the input KGs in the same embedding space. Also, we develop a missing links detector that discovers and recovers the missing links in the input KGs during the training process, which helps mitigate the incompleteness issue and thus improve the compatibility of the learned representations. The embeddings then are fused to generate the alignment result, and the high-confidence matched node pairs are updated to the pre-aligned supervision data to improve the embeddings gradually. Empirical results show that our model is more accurate than the SOTA and is robust against different levels of incompleteness.
What problem does this paper attempt to address?
The paper attempts to address the issue of incompleteness encountered when aligning knowledge graphs (KGs) from different domains. Specifically, existing alignment techniques typically assume that the input knowledge graphs are complete and isomorphic, which is not the case in the real world due to the heterogeneity in domain, scale, and sparsity of different knowledge graphs. Therefore, this paper proposes a novel approach to handle the incomplete knowledge graph alignment problem by capturing transitivity and proximity features between entities through a representation learning framework, while also developing a missing link detector to discover and recover missing links in the input knowledge graphs, thereby improving the accuracy of the alignment results.
### Main Contributions of the Paper:
1. **Proposing the IKAMI Framework**: This framework utilizes multi-channel feature exchange to simultaneously address entity alignment and knowledge completion, marking the first attempt to combine these two tasks.
2. **Translation-Based Embedding**: This method uses translation constraints to encode entities and relationships, aiding in the alignment of entities and relationships and mitigating the weaknesses of information dilution in GCN-based embeddings.
3. **Graph Convolutional Attention Network**: This network efficiently captures relational triples, including entity names, relationship types, and directions. The attention mechanism allows learning the underlying importance based on relationship types, focusing on more common relationships and ignoring those that appear only in one network.
4. **Missing Triple Detector**: This detector utilizes the learned translation-based features to jointly discover and complete missing links in the two input knowledge graphs.
5. **Joint Training Scheme**: A joint training scheme for the two embedding models is designed, enabling the overall objectives of the embeddings to support each other. Then, the similarity matrix for each channel is computed and weighted fusion is performed to return the final result.
6. **Experimental Validation**: Experiments conducted on real-world and synthetic knowledge graph datasets show that the framework outperforms other baseline methods in both entity alignment and knowledge completion tasks, with improvements of 15.2% and 3.5%, respectively.
### Main Challenges:
1. **Domain Gap**: The incompleteness of input knowledge graphs leads to inconsistencies between cross-lingual knowledge graphs (e.g., incompatible individual information, non-equivalent neighbor sets, different numbers of nodes in each graph).
2. **Task Gap**: While incompleteness may lead to inconsistencies, consistency (entity consistency and relationship consistency) should generally be respected, as these constraints help in accurately matching entities. Handling both knowledge completion and alignment tasks simultaneously is challenging.
3. **Model Gap**: The neighborhood structure of knowledge graphs provides rich information in various forms (e.g., relational triples, relationship directions). Recent works leverage this feature by stacking GCNs, but the similar structure of GCNs makes them susceptible to the same weaknesses.
### Solution Overview:
- **Representation Learning**: Embedding knowledge graph entities into different low-dimensional vector spaces through two designed feature channels (transitivity-based channel and proximity-based channel).
- **Alignment Computation**: Fusing the learned representations to compute the final alignment results.
- **Missing Triple Recovery**: Developing a two-step module to recover all possible missing triples in the input knowledge graphs using the learned representations.
Through these methods, the paper effectively addresses the key issues in incomplete knowledge graph alignment and demonstrates its superior performance in experiments.