Abstract:We study distributed algorithms for large-scale graphs, focusing on the fundamental problems of connectivity and minimum spanning tree (MST). We consider the k-machine model, a well-studied model for distributed computing for large-scale graph computations, where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n ≫ k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds (denoted Tc) of the computation. While communication is a significant factor that affects the time needed for large-scale computations, the computation cost incurred by the individual machines also contributes to the overall time complexity of the distributed algorithm. We posit a complexity measure called the local computation cost (denoted Tℓ) that measures the worst-case local computation cost among the machines. A lower bound for Tℓ in our model is Ω((m + n)/k + Δ + k), while a lower bound on Tc is Ω(n/k2) [Klauck et al., SODA 2015], where m is the number of edges and Δ is the maximum degree. Prior algorithms for connectivity and MST in the k-machine model [Klauck et al., SODA 2015, Pandurangan et al., SPAA 2016] do not take into account local computation; a straightforward local implementation of these algorithms is not optimal with respect to local computation. In this paper, we study several distributed algorithms for connectivity and MST and analyze their performance with respect to both the computation and communication cost. In particular, we analyze a well-studied flooding algorithm for connectivity and connected components that takes rounds and local computation time.1 We then present a deterministic filtering algorithm that has an improved round complexity of but local computation complexity of . Next, we present two deterministic algorithms which are increasingly sophisticated implementations of the classical Borůvka’s algorithm, the last of which has round complexity and local computation complexity . We finally present a randomized algorithm to find connected components with round complexity and local computation complexity that are both essentially optimal (up to polylogarithmic factors).

On the Complexity of Processing Massive, Unordered, Distributed Data

Progressive online aggregation in a distributed stream system

A Survey on Geographically Distributed Big-Data Processing using MapReduce

A Stack-Centric Processing Model for Iterative Processing

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Distributed Algorithms for Connectivity and MST in Large Graphs with Efficient Local Computation

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

The design of a streaming analytical workflow for processing massive transit feeds

A Survey of Distributed Graph Algorithms on Massive Graphs

Streaming supercomputing needs workflow-enabled programming-in-the-large

Quasar 3C298: a test-case for meteoritic nanodiamond 3.5 microns emission

Resolvable Designs for Speeding Up Distributed Computing

Low Complexity Distributed Computing via Binary Matrices with Extension to Stragglers

A Dynamic Data Partition Algorithm Oriented to MPI and OpenMP1

Analysis of Distributed Algorithms for Big-data

Stone Age Distributed Computing

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Distributed computing with the cloud