Abstract:Convolutional Neural Networks (CNNs) have revolutionized the research in computer vision, due to their ability to capture complex patterns, resulting in high inference accuracies. However, the increasingly complex nature of these neural networks means that they are particularly suited for server computers with powerful GPUs. We envision that deep learning applications will be eventually and widely deployed on mobile devices, e.g., smartphones, self-driving cars, and drones. Therefore, in this paper, we aim to understand the resource requirements (time, memory) of CNNs on mobile devices. First, by deploying several popular CNNs on mobile CPUs and GPUs, we measure and analyze the performance and resource usage for every layer of the CNNs. Our findings point out the potential ways of optimizing the performance on mobile devices. Second, we model the resource requirements of the different CNN computations. Finally, based on the measurement, pro ling, and modeling, we build and evaluate our modeling tool, Augur, which takes a CNN configuration (descriptor) as the input and estimates the compute time and resource usage of the CNN, to give insights about whether and how e ciently a CNN can be run on a given mobile platform. In doing so Augur tackles several challenges: (i) how to overcome pro ling and measurement overhead; (ii) how to capture the variance in different mobile platforms with different processors, memory, and cache sizes; and (iii) how to account for the variance in the number, type and size of layers of the different CNN configurations.

Neural Network Inference on Mobile SoCs

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

DaDianNao: A Machine-Learning Supercomputer

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

On-Device Neural Net Inference with Mobile GPUs

Cappuccino: Efficient Inference Software Synthesis for Mobile System-on-Chips

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference

Profiling and optimizing deep learning inference on mobile GPUs.

Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices

Thermal-Aware On-Device Inference Using Single-Layer Parallelization with Heterogeneous Processors

Augur: Modeling the Resource Requirements of ConvNets on Mobile Devices

MobileNetV2 Accelerator for Power and Speed Balanced Embedded Applications

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs

Mobile-Cloud Inference for Collaborative Intelligence

CAP: Communication-aware Automated Parallelization for Deep Learning Inference on CMP Architectures

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

An Embedded Inference Framework for Convolutional Neural Network Applications

Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds