Task Scheduling Strategy for Heterogeneous Spark Clusters

Yu Liang,Yu Tang,Xun Zhu,Xiaoyuan Guo,Chenyao Wu,Di Lin
DOI: https://doi.org/10.1007/978-981-15-0187-6_15
2020-01-01
Abstract:As a primary data processing and computing framework, Spark can support memory computing, interactive computing, and querying in a huge amount of data. Also, it can provide data mining, machine learning, stream computing, and the other services. However, the strategy of allocating resources among isomorphic processors cannot adapt to heterogeneous cluster environment due to its lack of load-based task scheduling. Therefore, we propose a dynamic load scheduling algorithm for heterogeneous Spark clusters by regularly collecting load information from each of the cluster node. Such an algorithm can dramatically reduce the allocation of load to the nodes which are already heavily loaded and in turn allocate more task to the idle nodes, thereby speeding up the process of job allocation in Spark. The experimental results show that the proposed algorithm can dramatically improve the computation efficiency by dynamically loading among the nodes in a heterogeneous cluster.
What problem does this paper attempt to address?