DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system

Zhaolong Jian,Xueshuo Xie,Yaozheng Fang,Yibing Jiang,Ye Lu,Ankan Dash,Tao Li,Guiling Wang
DOI: https://doi.org/10.1002/spe.3284
2023-10-27
Software Practice and Experience
Abstract:Summary Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud‐native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube‐scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. We initially frame the Kubernetes scheduling problem as a Markov decision process with intricately designed state, action, and reward structures in an effort to increase resource usage and decrease load imbalance. Then, we design and implement DRS mointor to perceive six parameters concerning resource utilization and create a thorough picture of all available resources globally. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube‐scheduler and achieves the expected scheduling target with three workloads. With only 3.27% CPU overhead and 0.648% communication delay, DRS outperforms Kube‐scheduler by 27.29% in terms of resource utilization and reduces load imbalance by 2.90 times on average.
computer science, software engineering
What problem does this paper attempt to address?