HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

Jiabin Chen,Fei Xu,Yikun Gu,Li Chen,Fangming Liu,Zhi Zhou
2024-05-09
Abstract:Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?