Abstract:Complex data mining algorithms are processed in multiple iterations, where output of one iteration is used as input for the subsequent iterations. Existing parallel programming frameworks, e.g., MapReduce, Pregel and Spark, adopt the breadth first search (BFS) strategy to process those iterative jobs. They invoke the user-defined functions for every key-value pair or vertex to produce all possible intermediate results for the next iteration. Such BFS strategy incurs high I/O overheads, because normally, the size of intermediate search results of BFS is exponential to the size of original data, making it impossible to maintain those intermediate results in memory. In this paper, we present a new type of parallel programming model, the stack-centric model, where all computations are defined for a stack maintained in the distributed shared memory. The stack can be adaptively split into multiple stacks and disseminated to different compute nodes for parallel processing. The most distinguished feature of the stack-centric model is its support for the depth first search (DFS) algorithm which incurs much less memory overhead than its BFS counterpart. The maximal memory usage of DFS algorithm is determined by the height of its search tree, and hence, it is possible to conduct the computation of DFS algorithm mostly in memory. Our stack-centric model is not a pure DFS framework. It supports the hybrid BFS and DFS algorithms by tuning the trade-off between memory usage and parallelism. To show the advantages of stack-centric model, we implement two algorithms, frequent pattern mining algorithm and DNA sequence matching algorithm, on both stack-centric model and Spark. The memory usage of stack-centric model is 10 times less than the Spark, resulting in a significant performance improvement.

Edge Cluster Based Large Graph Partitioning and Iterative Processing in BSP

A Distributed Graph-Parallel Computing System with Lightweight Communication Overhead

A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

Scalable Edge Partitioning

An Efficient Graph Processing System

Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints

Local Graph Edge Partitioning

Superblock: An Application-Aware Dynamic Partition Strategy for Large-Scale Graph

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning

Enhancing Balanced Graph Edge Partition with Effective Local Search

Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

Tuning the granularity of parallelism for distributed graph processing

VEBO: A Vertex- and Edge-Balanced Ordering Heuristic to Load Balance Parallel Graph Processing

Partitioning Trillion Edge Graphs on Edge Devices

A Stack-Centric Processing Model for Iterative Processing

A Feasible Graph Partition Framework for Parallel Computing of Big Graph

Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition

DHPV: a distributed algorithm for large-scale graph partitioning

How to Partition a Billion-Node Graph

3-D Partitioning for Large-Scale Graph Processing.

xDGP: A Dynamic Graph Processing System with Adaptive Partitioning