PREACT: Predictive Resource Allocation for Bursty Workloads in a Co-located Data Center

Dingyu Yang,Ziyang Xiao,Dongxiang Zhang,Shuhao Zhang,Jian Cao,Gang Chen
DOI: https://doi.org/10.1145/3673038.3673135
2024-01-01
Abstract:Co-locating online latency-critical (LC) services with best-effort (BE) batch jobs in the same server has been widely adopted by modern data centers to improve resource utilization. Various approaches have been proposed to maximize the resources allocated to the BE jobs without SLO (service level objective) violation. However, when facing bursty workloads, existing solutions suffer from poor performance because they cannot react promptly to the sudden and sharp increase of LC service requests. Consequently, these methods result in either a high violation rate of the SLO constraint or low resource utilization caused by conservative allocation strategies. In this paper, we propose PREACT as a predictive and agile resource allocation manager to support bursty workloads in a co-located data center. We devise an accurate and lightweight predictor based on a decomposable time series model to estimate the QPS (queries per second) for LC services in the next time window. Given the predicted QPS, we propose an SLO profiling model based on queuing theory and optimize it with multilayer perceptrons. The model is able to determine the maximum amount of resources that can be allocated to the BE jobs without any SLO violation. We conduct extensive experiments using real trace logs of multiple LC services with bursty workload patterns in a major E-commerce promotion campaign in 2021. The results establish the superiority of PREACT when handling bursty workloads — it incurs the lowest SLO violation and achieves comparable or higher CPU utilization than prior resource managers in a co-located data center.
What problem does this paper attempt to address?