Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Hongjian Li,Wei Luo,Wenbin Xie,Huaqing Ye,Xiaolin Duan
DOI: https://doi.org/10.1007/s10723-024-09756-4
2024-03-12
Journal of Grid Computing
Abstract:Spark Streaming is currently one of the mainstream stream processing frameworks which process real-time stream data by using micro-batch approach. However, there are some issues with its default task scheduling process, such as the high cost of cluster usage due to inappropriate executor placement strategy in heterogeneous cluster environments. Meanwhile, most of the current scheduling studies focus on improving the processing performance of the clusters, while ignoring the cost efficiency and service quality assurance of the clusters. In this paper, we propose a low-cost executor placement method based on resource demand prediction using machine learning under heterogeneous clusters, which is called Cost-Efficient and Best-Fit Decrease (CEBFD). First, a cost-efficient model is constructed for the Spark Streaming framework, then the Sparrow Search Algorithm (SSA) and eXtreme Gradient Boosting (XGboost) algorithm are combined to predict the resources required by streaming tasks, and finally the executor placement method for the heterogeneous Spark Streaming clusters is designed based on the cost-efficient model and resource demand prediction. Furthermore, the proposed method also improves the Service Level Agreement (SLA) of cost minimization and job deadline guarantee for streaming processing. Experimental results show that the proposed approach reduces the cluster usage cost by 6.89% to 52.24% and effectively optimizes SLA compared to existing algorithms.
computer science, information systems, theory & methods
What problem does this paper attempt to address?