SPM: Modeling Spark Task Execution Time from the Sub-stage Perspective.

Wei Li,Shengjie Hu,Di Wang,Tianba Chen,Yunchun Li
DOI: https://doi.org/10.1007/978-3-030-38961-1_1
2019-01-01
Abstract:Tasks are the basic unit of Spark application scheduling, and its execution is affected by various configurations of Spark cluster. Therefore, the prediction of task execution time is a challenging job. In this paper, we analyze the features of task execution procedure on different stages, and propose the method of prediction of each sub-stage execution time. Moreover, the correlative time overheads of GC and shuffle spill are analyzed in detail. As a result, we propose SPM, a task-level execution time prediction model. SPM can be used to predict the task execution time of each stage according to the input data size and configuration of parallelism. We further apply SPM to the Spark network emulation tool SNemu, which can determine the start time of each shuffle procedure for emulation effectively. Experimental results show that the prediction method can achieve high accuracy in a variety of Spark benchmarks on Hibench.
What problem does this paper attempt to address?