Distributed computing connected components with linear communication cost
Xing Feng,Lijun Chang,Xuemin Lin,Lu Qin,Wenjie Zhang,Long Yuan
DOI: https://doi.org/10.1007/s10619-018-7232-6
IF: 0.974
2018-01-01
Distributed and Parallel Databases
Abstract:The paper studies three fundamental problems in graph analytics, computing connected components ( CC s), biconnected components ( BCC s), and 2-edge-connected components ( ECC s) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CC s, BCC s and ECC s of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark . The state-of-the-art techniques for computing CCs and BCCs in Pregel incur O(m×#supersteps) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CC s and BCC s with O ( m ) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECC s. We are the first to study this problem and an approach with O ( m ) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.