Bottom-up Anytime Discovery of Generalised Multimodal Graph Patterns for Knowledge Graphs

Xander Wilcke,Rick Mourits,Auke Rijpma,Richard Zijdeman
2024-10-08
Abstract:Vast amounts of heterogeneous knowledge are becoming publicly available in the form of knowledge graphs, often linking multiple sources of data that have never been together before, and thereby enabling scholars to answer many new research questions. It is often not known beforehand, however, which questions the data might have the answers to, potentially leaving many interesting and novel insights to remain undiscovered. To support scholars during this scientific workflow, we introduce an anytime algorithm for the bottom-up discovery of generalized multimodal graph patterns in knowledge graphs. Each pattern is a conjunction of binary statements with (data-) type variables, constants, and/or value patterns. Upon discovery, the patterns are converted to SPARQL queries and presented in an interactive facet browser together with metadata and provenance information, enabling scholars to explore, analyse, and share queries. We evaluate our method from a user perspective, with the help of domain experts in the humanities.
Artificial Intelligence,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to automatically discover general multimodal graph patterns in knowledge graphs, in order to help scholars identify potentially interesting patterns in the early stage of scientific research, thus providing clues for new research problems and supporting evidence for existing research directions. Specifically, this method aims to: 1. **Support scholars in the early stage of the scientific workflow**: By revealing potentially interesting patterns in the data, help scholars form new research questions. 2. **Integrate multiple unstructured regularities**: Combine unstructured regularities such as numerical, temporal, and textual attributes with structured regularities to generate richer graph patterns. 3. **Improve the interpretability and usability of patterns**: Convert the discovered patterns into SPARQL queries and display them to users through an interactive interface for easy exploration, analysis, and sharing. ### Main contributions - Proposed a bottom - up anytime algorithm for discovering general multimodal graph patterns in knowledge graphs. - Naturally integrated various unstructured regularities into general graph patterns. - Verified the effectiveness of this method through the evaluation of domain experts. ### Method overview This algorithm adopts a bottom - up approach to gradually generate and expand graph patterns, ensuring that potentially interesting patterns can be discovered in each iteration. The main steps include: 1. **Basic pattern generation**: Start from a single - clause pattern and ensure that these patterns meet the minimum support requirements. 2. **Pattern expansion**: Expand existing graph patterns by matching appropriate base patterns and ensure that the newly generated patterns still have sufficient support. 3. **Optimize the search space**: Reduce invalid or low - support patterns and clauses through intelligent pruning and other optimization techniques. 4. **Pattern browser**: Provide an interactive interface to help users filter, save, and share the discovered patterns. ### Application scenarios This method is especially suitable for knowledge graphs dealing with a large amount of heterogeneous knowledge, such as historical archives and museum collections in the humanities and social sciences. By combining structured and unstructured regularities, potential information in the data can be more comprehensively mined, providing strong support for interdisciplinary research. ### Example For example, when processing civil records, this algorithm can discover the distribution law of the death age of unemployed women (as shown in Figure 1), which helps scholars further explore the impact of socio - economic factors on health. \[ \text{Figure 1: An example of a subgraph in the civil registry domain (left) and a possible graph pattern (right).} \] In this way, this algorithm can not only reveal the structural regularities in the data, but also capture the unstructured regularities in numerical, temporal, textual and other attributes, providing scholars with a more fine - grained view of the data.