Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge Graphs

Jian Zeng,Leong Hou U,Xiao Yan,Yan Li,Mingji Han,Bo Tang
DOI: https://doi.org/10.1109/tkde.2022.3233594
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:A knowledge graph contains many real-world facts that can be used to support various analytical tasks, e.g., exceptional fact discovery and the check of claims. In this work, we attempt to extract top-$k$ frequent and diversified patterns from knowledge graph by well capturing user interest. Specifically, we first formalize the core-based top-$k$ frequent pattern discovery problem, which finds the top-$k$ frequent patterns that are extended from a core pattern specified by user query and have the highest frequency. In addition, to diversify the top-$k$ frequent patterns, we define a distance function to measure the dissimilarity between two patterns, and return top-$k$ patterns in which the pairwise diversity of any two resultant patterns exceeds a given threshold. As the search space of candidate patterns is exponential w.r.t. the number of nodes and edges in the knowledge graph, discovering frequent and diversified patterns is computationally challenging. To achieve high efficiency, we propose a suite of techniques, including (1) We devise a meta-index to avoid generating invalid candidate patterns; (2) We propose an upper bound of the frequency score (i.e., MNI) of the candidate pattern, which is used to prune unqualified candidates earlier and prioritize the enumeration order of patterns; (3) We design an advanced join-based approach to compute the MNI of candidate patterns efficiently; and (4) We develop a lower bound for distance function and incrementally compute the pairwise diversity among the patterns. Using real-world knowledge graphs, we experimentally verify the efficiency and effectiveness of our proposed techniques. We also demonstrate the utility of the extracted patterns by case studies.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?