An Elastic Data Persisting Solution with High Performance for Spark.

Zhipeng Jiang,Haopeng Chen,Huan Zhou,Jenny Wu
DOI: https://doi.org/10.1109/smartcity.2015.144
2015-01-01
Abstract:With the increasing popularity of in-memory computing, Spark [1] has been highly successful in implementing large scale data intensive applications, especially for those that reuse data across multiple parallel operations. However due to the fact that Moore's Law has slowed down and memory resources are still costly, we presented an elastic data persisting solution for Spark, which enables data compression to save more heap space for JVM and reducing disk I/O throughput for faster data access. We mathematically derived the criteria for selecting the optimal data compression and persisting plan. Our evaluation of the preliminary prototype of this elastic data persisting solution shows that it can provide resource management recommendations by accounting for input data type, memory space and CPU resource, and can consistently yield high performance that accelerates Spark up to 6x.
What problem does this paper attempt to address?