Fast and Fine-grained Autoscaler for Streaming Jobs with Reinforcement Learning.

Mingzhe Xing,Hangyu Mao,Zhen Xiao
DOI: https://doi.org/10.24963/ijcai.2022/80
2022-01-01
Abstract:On computing clusters, the autoscaler is responsible for allocating resources for jobs or fine-grained tasks to ensure their Quality of Service. Due to a more precise resource management, fine-grained autoscaling can generally achieve better performance. However, the fine-grained autoscaling for streaming jobs needs intensive computation to model the complicated running states of tasks, and has not been adequately studied previously. In this paper, we propose a novel fine-grained autoscaler for streaming jobs based on reinforcement learning. We first organize the running states of streaming jobs as spatio-temporal graphs. To efficiently make autoscaling decisions, we propose a Neural Variational Subgraph Sampler to sample spatio-temporal subgraphs. Furthermore, we propose a mutual-information-based objective function to explicitly guide the sampler to extract more representative subgraphs. After that, the autoscaler makes decisions based on the learned subgraph representations. Experiments conducted on real-world datasets demonstrate the superiority of our method over six competitive baselines.
What problem does this paper attempt to address?