Hierarchical Blockmodelling for Knowledge Graphs

Marcin Pietrasik,Marek Reformat,Anna Wilbik
2024-08-28
Abstract:In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.
Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the problem of entity clustering in knowledge graphs and organizing these clusters into a hierarchical structure. Specifically, the authors propose a method based on probabilistic graphical models (particularly the stochastic block model) for hierarchical entity clustering on knowledge graphs. This method decomposes the knowledge graph into a set of probability distributions and samples from these distributions to generate random graphs, thereby inducing a hierarchical clustering structure. ### Main Issues 1. **Entity Clustering**: How to group entities in a knowledge graph based on similarity. 2. **Hierarchy Induction**: How to further organize these clusters into a hierarchical structure based on entity clustering. ### Solution - **Stochastic Block Model**: The authors adopt a non-parametric stochastic block model, using the Nested Chinese Restaurant Process (nCRP) and Stick Breaking Process (SBP) to generate hierarchical clustering structures. - **Inference Method**: A collapsed Gibbs sampling scheme is proposed for model parameter inference. ### Background - **Knowledge Graphs**: In recent years, using graph structures to model and store data has become increasingly popular. A knowledge graph is a multi-layer graph where entities interact through different types of relationships. - **Existing Methods**: Existing methods for entity clustering and hierarchy induction mainly focus on statistical pattern discovery, association rule mining, and embedding-based methods. ### Contributions of the Paper - **First Application**: This is the first application of the stochastic block model to hierarchical clustering in knowledge graphs. - **Non-Parametric Method**: This method does not require pre-setting constraints on the hierarchical structure and can automatically induce hierarchical clustering. - **Experimental Validation**: Experiments were conducted on synthetic and real datasets, quantitatively and qualitatively evaluating the effectiveness of the model. ### Potential Applications - **Knowledge Graph Analysis**: Can help researchers better understand the implicit structure in knowledge graphs. - **Data Organization**: Can be used to organize large-scale knowledge graph data, making it easier to understand and use. - **Downstream Tasks**: Can serve as a preprocessing step for other tasks (such as link prediction, entity classification, etc.). In summary, this paper proposes a novel method for hierarchical entity clustering in knowledge graphs using the stochastic block model, addressing the issues of entity clustering and hierarchy induction, and providing new directions for future research.