Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning
Shuangwu Chen,Jiangming Li,Qifeng Yuan,Huasen He,Sen Li,Jian Yang
DOI: https://doi.org/10.1109/tpds.2024.3467212
IF: 5.3
2024-10-18
IEEE Transactions on Parallel and Distributed Systems
Abstract:As a new computing paradigm, multi-data center computing enables service providers to deploy their applications close to the users. However, due to the spatio-temporal changes in workloads, it is challenging to coordinate multiple distributed data centers to provide high-quality services while reducing service operation costs. To address this challenge, this article studies the joint optimization problem of task scheduling and resource scaling in multi-data center systems. Since the task scheduling and the resource scaling are usually performed in different timescales, we decompose the joint optimization problem into two sub-problems and propose a two-timescale optimization framework. The short-timescale task scheduling can promptly relieve the bursty arrivals of computing tasks, and the long-timescale resource scaling can adapt well to the long-term changes in workloads. To address the distributed optimization problem, we propose a two-timescale multi-agent deep reinforcement learning algorithm. In order to characterize the graph-structured states of connected data centers, we develop a directed graph convolutional network based global state representation model. The evaluation indicates that the proposed algorithm is able to reduce both the task makespan and the task timeout while maintaining a reasonable cost.
computer science, theory & methods,engineering, electrical & electronic