Auto-Tuning Spark Configurations Based On Neural Network

Jing Gu,Ying Li,Hongyan Tang,Zhonghai Wu
DOI: https://doi.org/10.1109/ICC.2018.8422658
2018-01-01
Abstract:For massive data processing platforms such as Spark, configuration tuning is a necessary step since it is closely related to task parallelism, resource allocation and fault tolerance, which has a great influence on performance. However, to tune more than 190 interrelated configuration parameters of Spark for performance optimization is a challenging job.In this paper, a neural network based configuration tuning approach is proposed. In this approach, a neural network model is trained to predict the increase or decrease of configurations which determines the next search space. And a performance model based on random forest is used to improve search efficiency by predicting running time of jobs instead of running jobs actually. We evaluated the approach with four typical Spark applications. Experiment results show that compared to the default configuration, on average 42.8% of execution time of Spark applications are reduced by our approach. Moreover, the proposed approach outperforms over related approaches with optimal configuration and less search time.
What problem does this paper attempt to address?