Fast Algorithms for Semantic Association Search and Pattern Mining
Gong Cheng,Daxin Liu,Yuzhong Qu
DOI: https://doi.org/10.1109/tkde.2019.2942031
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Given a large graph representing relations between entities, searching for complex relationships (called semantic associations, or SAs for short) between a set of entities is a common type of information needs in many domains. Further, numerous SAs are often abstracted into a few frequent high-level conceptual graph patterns (called SA patterns, or SAPs for short), which organize SAs into interpretable subgroups. Whereas the quality and usefulness of SAs and SAPs have been extensively studied in the literature, in this article we aim to develop faster algorithms for SA search and frequent SAP mining. For the former problem, we leverage distances to prune the search space, and implement a distance oracle to balance the time and space for distance calculation. For the latter problem, we exploit both graph structure and labels to induce fine-grained skeleton-based partitions of SAs, which may be pruned to reduce SAP enumeration. Besides, we generate canonical codes for SAs, which not only enable result deduplication but also are reused in SAP mining to improve the overall performance. We extensively evaluate the efficiency of our algorithms on four large graphs, using both random queries and simulated queries which reproduce the extreme case of finding numerous SAs.