Adaptive priority-based data placement and multi-task scheduling in geo-distributed cloud systems

Chunlin Li,Jun Liu,Weigang Li,Youlong Luo
DOI: https://doi.org/10.1016/j.knosys.2021.107050
2021-07-01
Abstract:With the rapid development and the widespread use of cloud computing in various applications, the number of users distributed in different regions has grown exponentially. Therefore, the Geo-distributed cloud systems have become a research hotspot and big data processing technology has also emerged. Nowadays, the most widely used big data processing framework is Spark. However, massive amounts of data are generated every moment, and the processing procedure becomes more and more complex, the execution efficiency of Spark has been greatly affected. In the Spark frame of geo-distributed cloud systems, aiming at the data placement problem, the data placement strategy based on RDD dynamic weight is introduced. The target node is selected with a strong computation capacity to place the data. Aiming at the problems of multi-task scheduling, a task will be scheduled to a node whose computation capacity can satisfy the requirement of this task. And then considering job classification and computing node performance, the optimized task scheduling strategy is in traduced. Experiments show that our algorithms can effectively adjust the weight of node data placement according to the actual task execution information, and shorten the task execution time.
computer science, artificial intelligence
What problem does this paper attempt to address?