AsyFunc

Qiangyu Pei,Yongjie Yuan,Haichuan Hu,Qiong Chen,Fangming Liu
DOI: https://doi.org/10.1145/3620678.3624664
2023-01-01
Abstract:Recent advances in deep learning (DL) have spawned various intelligent cloud services with well-trained DL models. Nevertheless, it is nontrivial to maintain the desired end-to-end latency under bursty workloads, raising critical challenges on high-performance while resource-efficient inference services. To handle burstiness, some inference services have migrated to the serverless paradigm for its rapid elasticity. However, they neglect the impact of the time-consuming and resource-hungry model-loading process when scaling out function instances, leading to considerable resource inefficiency for maintaining high performance under burstiness.
What problem does this paper attempt to address?