Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services

Yang Li,Zhenhua Han,Quanlu Zhang,Zhenhua Li,Haisheng Tan
DOI: https://doi.org/10.1109/infocom41043.2020.9155267
2020-01-01
Abstract:Real-time online services using pre-trained deep neural network (DNN) models, e.g., Siri and Instagram, require low-latency and cost-efficiency for quality-of-service and commercial competitiveness. When deployed in a cloud environment, such services call for an appropriate selection of cloud configurations (i.e., specific types of VM instances), as well as a considerate device placement plan that places the operations of a DNN model to multiple computation devices like GPUs and CPUs. Currently, the deployment mainly relies on service providers’ manual efforts, which is not only onerous but also far from satisfactory oftentimes (for a same service, a poor deployment can incur significantly more costs by tens of times). In this paper, we attempt to automate the cloud deployment for real-time online DNN inference with minimum costs under the constraint of acceptably low latency. This attempt is enabled by jointly leveraging the Bayesian Optimization and Deep Reinforcement Learning to adaptively unearth the (nearly) optimal cloud configuration and device placement with limited search time. We implement a prototype system of our solution based on TensorFlow and conduct extensive experiments on top of Microsoft Azure. The results show that our solution essentially outperforms the nontrivial baselines in terms of inference speed and cost-efficiency.
What problem does this paper attempt to address?