Pyxis: Scheduling Mixed Tasks in Disaggregated Datacenters

Sheng Qi,Chao Jin,Mosharaf Chowdhury,Zhenming Liu,Xuanzhe Liu,Xin Jin
DOI: https://doi.org/10.1109/tpds.2024.3418620
IF: 5.3
2024-07-20
IEEE Transactions on Parallel and Distributed Systems
Abstract:Disaggregating compute from storage is an emerging trend in cloud computing. Effectively utilizing resources in both compute and storage pool is the key to high performance. The state-of-the-art scheduler provides optimal scheduling decisions for workloads with homogeneous tasks. However, cloud applications often generate a mix of tasks with diverse compute and IO characteristics, resulting in sub-optimal performance for existing solutions. We present Pyxis, a system that provides optimal scheduling decisions for mixed workloads in disaggregated datacenters with theoretical guarantees. Pyxis is capable of maximizing overall throughput while meeting latency SLOs. Pyxis decouples the scheduling of different tasks. Our insight is that the optimal solution has an "all-or-nothing" structure that can be captured by a single turning point in the spectrum of tasks. Based on task characteristics, the turning point partitions the tasks either all to storage nodes or all to compute nodes (none to storage nodes). We theoretically prove that the optimal solution has such a structure, and design an online algorithm with sub-second convergence. We implement a prototype of Pyxis. Experiments on CloudLab with various synthetic and application workloads show that Pyxis improves the throughput by 3–21× over the state-of-the-art solution.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?