A Spark Scheduling Strategy for Heterogeneous Cluster

Xuewen Zhang,Zhonghao Li,Gongshen Liu,Jiajun Xu,Tiankai Xie,Jan Pan Nees
DOI: https://doi.org/10.3970/cmc.2018.02527
2018-01-01
Abstract:As a main distributed computing system, Spark has been used to solve problems with more and more complex tasks. However, the native scheduling strategy of Spark assumes it works on a homogenized cluster, which is not so effective when it comes to heterogeneous cluster. The aim of this study is looking for a more effective strategy to schedule tasks and adding it to the source code of Spark. After investigating Spark scheduling principles and mechanisms, we developed a stratifying algorithm and a node scheduling algorithm is proposed in this paper to optimize the native scheduling strategy of Spark. In this new strategy, the static level of nodes is calculated, the dynamic factors such as the length of running tasks, and CPU usage of work nodes are considered comprehensively. And through a series of comparative experiments in alienation cluster, the new strategy costs less running time and lower CPU usage rate than the original Spark strategy, which verifies that the new schedule strategy is more effective one.
What problem does this paper attempt to address?