EMF: Disaggregated GPUs in Datacenters for Efficiency, Modularity and Flexibility

Anubhav Guleria,J. Lakshmi,C. Padala
DOI: https://doi.org/10.1109/CCEM48484.2019.000-5
2019-09-01
Abstract:Disaggregating expensive and power-hungry GPUs enable a cost-efficient and adaptive ecosystem for cloud deployment, particularly for emerging markets, wherein AI applications are some of the dominant ones using them. This paper motivates GPU disaggregation and highlights key properties useful in resource management of disaggregated resource frameworks. An evaluation of current design approaches to GPU disaggregation is made and analysis of the NVIDIA GPU stack is done to identify various abstract layers of the stack for disaggregating. Further, based on this analysis the paper proposes a rack-level, opensource based, and backward compatible GPU disaggregation system called EMF. Key design decisions of EMF and how these choices enable scalability, efficiency, and fault-tolerance are discussed. EMF design is evaluated using an analytical model derived from low-level interactions between proprietary NVIDIA host driver and NVIDIA GPUs over PCIe. The worst-case latency analysis indicate that overheads in proposed design could vary from 7.6% to 20.2% depending on the application characteristics, justifying the practicality of this design for cloud setups.
Computer Science,Engineering
What problem does this paper attempt to address?