Ultra-Low-Latency Distributed Deep Neural Network over Hierarchical Mobile Networks

Jen-I Chang,Jian-Jhih Kuo,Chi-Han Lin,Wen-Tsuen Chen,Jang-Ping Sheu
DOI: https://doi.org/10.1109/globecom38437.2019.9014122
2019-01-01
Abstract:Recently, the notions of partitioning the Deep Neural Network (DNN) model over the multi-level computing units and making a fast inference with the early-inference technique have been proposed to shorten the inference time. Such computing units form a hierarchical mobile network to provide locality-aware computation, and the early-inference technique allows the prediction results to early exit the model with a probability. However, an inadequate model partition and misapply early inference may prolong response time. Previous studies focus on the classifier design for early inference, and thus, the optimal model partition with classifier deployment has not been explored. In this paper, we study DEMAND-OPE to consider response time and throughput. We first design the COLT for the simplified DEMAND-OPE without Optional Exit Points (DEMAND) to carefully balance the computing time and data transfer time. Then, an extension termed COLT-OPE is developed to achieve the lower response time. Simulation results show that our algorithms (COLT-OPE) outperform previous methods by 200%.
What problem does this paper attempt to address?