Accurate Deep Learning Inference Latency Prediction over Dynamic Running Mobile Devices

Junquan Fan,Jiahui Hou,Xiangyang Li
DOI: https://doi.org/10.1109/msn60784.2023.00052
2023-01-01
Abstract:With the increasing number of deep learning applications, the optimization of deep learning model performance has become a central focus of research. One of the critical indicators is model-inference latency. Rapid and accurate prediction of this latency is essential for effective model design and deployment. However, existing methods, which rely on performance data, often yield inaccurate predictions.To tackle this challenge, we propose a model-inference latency predictor specifically designed for mobile devices. Our predictor can rapidly and accurately predict the inference latency of various Convolutional Neural Network (CNN) models under various load conditions of the device. The key idea behind our predictor is to combine hardware features, model computational graph features, and convolution latency features to achieve more precise latency prediction. We comprehensively evaluate our system on mobile devices. The accuracy of our predictor within +/- 10% error exceeds 90% at different frequencies, CPU utilizations, and bandwidths. Additionally, our predictor demonstrates the capability to predict model-inference latency under various conditions, which is not possible with existing methods. As a result, our system can offer precise and fast model-inference latency for various CNN models and devices.
What problem does this paper attempt to address?