Zeal: Rethinking Large-Scale Resource Allocation with "Decouple and Decompose"

Zhiying Xu,Francis Y. Yan,Minlan Yu
2024-12-16
Abstract:Resource allocation is fundamental for cloud systems to ensure efficient resource sharing among tenants. However, the scale of such optimization problems has outgrown the capabilities of commercial solvers traditionally employed in production. To scale up resource allocation, prior approaches either tailor solutions to specific problems or rely on assumptions tied to particular workloads. In this work, we revisit real-world resource allocation problems and uncover a common underlying structure: a vast majority of these problems are inherently separable, i.e., they optimize the aggregate utility of individual resource and demand allocations, under separate constraints for each resource and each demand. Building on this insight, we develop DeDe, a general, scalable, and theoretically grounded framework for accelerating resource allocation through a "decouple and decompose" approach. DeDe systematically decouples entangled resource and demand constraints, thereby decomposing the overall optimization into alternating per-resource and per-demand allocations, which can then be solved efficiently and in parallel. We have implemented DeDe as a library extension to an open-source solver, maintaining a familiar user interface. Experimental results across three prominent resource allocation tasks -- traffic engineering, cluster scheduling, and load balancing -- demonstrate DeDe's substantial speedups and robust allocation quality.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?