Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided Navigation

Changyu Wang,Kevin Chang,Pinghui Wang,Tao Qin,Xiaohong Guan
DOI: https://doi.org/10.1109/tkde.2020.3038458
IF: 9.235
2020-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:With numerous nodes on online heterogeneous networks, how to reach and extract target nodes of our specific interests is a pressing problem. In this paper, we propose a novel heterogeneous network crawler, MCrawl. It addresses the problem via iterative online heterogeneous network crawling by navigating its available APIs, starting from a set of target nodes, i.e., seed nodes. We are facing two challenges towards addressing the problem. First, to navigate within a vast network, how do we start from a small set of target nodes? In other words, which nodes in the "current frontier" and which direction shall we expand, to reach promising target nodes quickly? We propose motif-based crawling to exploit the complex structures and rich semantics of heterogeneous networks. Second, in many scenarios, we do not have a classifier to assess the quality of the harvested nodes and thus the motifs to expand. We develop a probabilistic inference framework to estimate the yield and harvest rates of motifs, achieving principled bootstrapping for crawling. Our experiment on real networks of MCrawl achieves significant margins over baselines.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?