An Online Algorithm for Scheduling Big Data Analysis Jobs in Cloud Environments

Youyou Kang,Li Pan,Shijun Liu
DOI: https://doi.org/10.1016/j.knosys.2022.108628
IF: 8.139
2022-01-01
Knowledge-Based Systems
Abstract:Cloud computing has become a popular platform for processing big data analysis jobs with its advantages of high-availability, elasticity and cost-efficiency. Many big data analysis service providers use cloud instances to process users’ big data analysis job execution requests and they need efficient scheduling algorithms to improve job execution efficiency and economic benefits. This paper presents a problem of minimizing the execution time of a batch of big data analysis jobs without changing the number of cloud instances. Solving this problem can not only improve big data job execution efficiency in cloud environments and user satisfaction, but also bring higher economic benefits to big data analysis service providers. This paper proposes an online scheduling algorithm, which can make full use of the parallelism of big data analysis jobs to optimize job scheduling decisions on the premise that the job execution time cannot be accurately known. For evaluating the performance of the proposed online scheduling algorithm, a traditional two-phase scheduling algorithm is introduced as a benchmark for comparison in this paper. Theoretical analysis and extensive simulation experiments based on real datasets show that the online scheduling algorithm proposed in this paper can achieve more stable performance compared with the benchmark two-phase scheduling algorithm.
What problem does this paper attempt to address?