Efficient structural graph clustering: an index-based approach

Dong Wen,Lu Qin,Ying Zhang,Lijun Chang,Xuemin Lin
DOI: https://doi.org/10.1007/s00778-019-00541-4
2019-05-08
The VLDB Journal
Abstract:Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (<span>\(\mathsf {SCAN}\)</span>) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, <span>\(\epsilon \)</span> and <span>\(\mu \)</span>, while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing <span>\(\mathsf {SCAN}\)</span> solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for <span>\(\mathsf {SCAN}\)</span>. Based on our index, we cluster the graph for any <span>\(\epsilon \)</span> and <span>\(\mu \)</span> in <span>\(O(\sum _{C\in \mathbb {C}}|E_C|)\)</span> time, where <span>\(\mathbb {C} \)</span> is the result set of all clusters and <span>\(|E_C|\)</span> is the number of edges in a specific cluster <span>\(C\)</span>. In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index's space complexity is <em>O</em>(<em>m</em>), where <em>m</em> is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.
What problem does this paper attempt to address?