What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings

Zequn Sun,Jiacheng Huang,Xiaozhou Xu,Qijin Chen,Weijun Ren,Wei Hu
2023-06-05
Abstract:Joint representation learning over multi-sourced knowledge graphs (KGs) yields transferable and expressive embeddings that improve downstream tasks. Entity alignment (EA) is a critical step in this process. Despite recent considerable research progress in embedding-based EA, how it works remains to be explored. In this paper, we provide a similarity flooding perspective to explain existing translation-based and aggregation-based EA models. We prove that the embedding learning process of these models actually seeks a fixpoint of pairwise similarities between entities. We also provide experimental evidence to support our theoretical analysis. We propose two simple but effective methods inspired by the fixpoint computation in similarity flooding, and demonstrate their effectiveness on benchmark datasets. Our work bridges the gap between recent embedding-based models and the conventional similarity flooding algorithm. It would improve our understanding of and increase our faith in embedding-based EA.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: in multi - source knowledge graphs (KGs), how to achieve alignment in entity embedding representation learning and why these embeddings are effective. Specifically, the paper focuses on the working principles of embedding - based entity alignment (EA) models, that is, how these models generate similar entity embeddings so that the same entities in different KGs can be successfully aligned. ### Core Problems of the Paper 1. **Reasons for Entity Embedding Similarity**: - Although existing embedding - based EA techniques have made significant progress, a key question remains unanswered: **What factors make entity embeddings similar in EA models?** 2. **Connection between Theory and Traditional Methods**: - The connection between existing embedding - based EA models and traditional symbolic methods has not been fully explored. ### Solutions To solve the above problems, the paper introduces the perspective of **Similarity Flooding (SF)** to explain and improve embedding - based EA models. SF is an algorithm widely used in structured data matching, and its core idea is to propagate similarity by iteratively calculating fixed points. The paper proves that existing translation - and aggregation - based EA models are actually looking for fixed points of similarity between entity pairs. ### Main Contributions 1. **Theoretical Analysis**: - Provides the first theoretical analysis of embedding - based EA techniques, revealing the working mechanisms of these models. - Unifies basic translation - and aggregation - type EA models from the perspective of similarity flooding. - Establishes a close connection between embedding - based and traditional symbolic methods through the unified fixed - point calculation perspective. 2. **Proposing New Methods**: - Proposes two simple but effective methods to improve EA: - **Similarity - Flooding - Based Variant**: Calculate the fixed point of similarity from entity combinations induced by TransE or GCN without learning KG embeddings. - **Self - Propagating Connection**: Introduce self - propagating connections in neighborhood aggregation, giving entity embeddings the opportunity to propagate back to themselves, thereby improving the alignment effect. 3. **Experimental Verification**: - Conducts experiments on benchmark datasets such as DBP15K and OpenEA, verifies the effectiveness of the proposed methods, and provides experimental evidence to support the theoretical conclusions. ### Formula Summary - **Similarity Matrix**: \[ \Omega=(x_1; x_2; \ldots; x_n)^\top(y_1; y_2; \ldots; y_m)\in\mathbb{R}^{n\times m} \] - **Fixed - Point Formula**: \[ \Omega = \text{normalize}\left(\Omega_0+\Lambda\Omega(\Lambda')^\top\right) \] - **Self - Propagating Aggregation Function**: \[ e_{i + 1}=(1-\alpha)\oplus_{z\in N(e)}(z)+\alpha f(e_i) \] Through these methods, the paper not only deepens our understanding of embedding - based EA techniques but also provides a new perspective to improve the performance of these models.