Abstract:Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.

Benchmarking Graph Data Management and Processing Systems: A Survey

Benchmarking Big Data Systems: State-of-the-Art and Future Directions

SoK: The Faults in our Graph Benchmarks

On Big Data Benchmarking

Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

Which Category Is Better: Benchmarking Relational and Graph Database Management Systems

BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework

Benchmark Data Repositories for Better Benchmarking

Data Processing Benchmarks

Large-scale graph processing systems: a survey

A Characterization of Big Data Benchmarks

Large scale graph processing systems: survey and an experimental evaluation

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Benchmarking Distributed Stream Data Processing Systems

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

A Dwarf-based Scalable Big Data Benchmarking Methodology

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Benchmarking Graph Conformal Prediction: Empirical Analysis, Scalability, and Theoretical Insights