Joint Model and Data Adaptation for Cloud Inference Serving

Jingyan Jiang,Ziyue Luo,Chenghao Hu,Zhaoliang He,Zhi Wang,Shutao Xia,Chuan Wu
DOI: https://doi.org/10.1109/RTSS52674.2021.00034
2021-01-01
Abstract:Real-time deep learning inference serving systems often require prohibitive resources and diverse user requirements. The existing design of inference serving systems mainly focusing on computation resource efficiency, largely ignoring the trade-off between computation and bandwidth resources in need. Sub-optimal resource utilization usually leads to huge serving cost waste. In this paper, we tackl...
What problem does this paper attempt to address?