A Parallel Graph Data Analysis System Based on Spark *

wang hongxu,wu bin,liu yang
DOI: https://doi.org/10.3778/j.issn.1673-9418.1411045
2014-01-01
Abstract:This paper proposes a parallel data analysis system based on the cloud computing platform of Spark. This system mainly aims at large-scale graph data analysis tasks, supports analysis applications of non-graph data, and integrates the sets of data analysis algorithms and non-graph data analysis algorithms. Then, this paper describes the design and implementation of the system, as well as workflow engine and dynamic component update technology, part of the parallel data analysis algorithms. Through tests of multiple scales of datasets and performance comparison with traditional MapReduce platform, this paper proves that the system is more efficient at completing computing tasks compared with the previous graph data mining system, and can analyze efficiently non-graph data.
What problem does this paper attempt to address?