Cluster-preserving Sampling Algorithm for Large-Scale Graphs.

Jianpeng Zhang,Hongchang Chen,Dingjiu Yu,Yulong Pei,Yingjun Deng
DOI: https://doi.org/10.1007/s11432-021-3370-4
2023-01-01
Abstract:Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population’s clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
What problem does this paper attempt to address?