PUVAR:Minimize Idle Resource SLO Violations by Uncertainty-Aware Scheduling in Cloud Platforms

Jiawei Li,Han Zhang,Jilong Wang
DOI: https://doi.org/10.1109/icws60048.2023.00094
2023-01-01
Abstract:Nowadays, idle resource makes up a non-negligible fraction of datacenter capacity in mainstream cloud platforms. Cloud platforms offer idle resource with low service level objectives (SLOs) at low prices to attract cost-sensitive users. Despite their fault tolerance, these users still want some SLO guarantees for idle resource. Cloud platforms have started to provide statistical SLO for idle resource, but the violations of statistical SLOs have not been modeled and minimized. In this paper, we propose PUVAR, a scheduling policy based on prediction+optimization, to explicitly model and minimize the statistical SLO violation for idle resource in cloud platforms. The design of PUVAR possesses two major innovations: (1) explicitly quantify prediction uncertainty of idle resource capacity and iteratively reduce its impact on scheduling decisions; (2) treat the SLO of idle resources as a soft constraint and minimize its violation by two-stage scheduling of regular requests and idle requests. We provide a theoretical convergence rate for the parameter optimization of PUVAR. Promising results of ablation analysis by comparison with popular baseline algorithms on the trace of a real cloud platform indicate that PUVAR can significantly reduce the SLO violation of idle resource with little additional cost on the current scheduling of regular requests.
What problem does this paper attempt to address?