Abstract:With the development of deep learning technologies and edge computing, the combination of them can make artificial intelligence ubiquitous. Due to the constrained computation resources of the edge device, the research in the field of on-device deep learning not only focuses on the model accuracy but also on the model efficiency, for example, inference latency. There are many attempts to optimize the existing deep learning models for the purpose of deploying them on the edge devices that meet specific application requirements while maintaining high accuracy. Such work not only requires professional knowledge but also needs a lot of experiments, which limits the customization of neural networks for varied devices and application scenarios. In order to reduce the human intervention in designing and optimizing the neural network structure, multi-objective neural architecture search methods that can automatically search for neural networks featured with high accuracy and can satisfy certain hardware performance requirements are proposed. However, the current methods commonly set accuracy and inference latency as the performance indicator during the search process, and sample numerous network structures to obtain the required neural network. Lacking regulation to the search direction with the search objectives will generate a large number of useless networks during the search process, which influences the search efficiency to a great extent. Therefore, in this paper, an efficient resource-aware search method is proposed. Firstly, the network inference consumption profiling model for any specific device is established, and it can help us directly obtain the resource consumption of each operation in the network structure and the inference latency of the entire sampled network. Next, on the basis of the Bayesian search, a resource-aware Pareto Bayesian search is proposed. Accuracy and inference latency are set as the constraints to regulate the search direction. With a clearer search direction, the overall search efficiency will be improved. Furthermore, cell-based structure and lightweight operation are applied to optimize the search space for further enhancing the search efficiency. The experimental results demonstrate that with our method, the inference latency of the searched network structure reduced 94.71% without scarifying the accuracy. At the same time, the search efficiency increased by 18.18%.

Implication of Optimizing NPU Dataflows on Neural Architecture Search for Mobile Devices

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

DPP-Net: Device-Aware Progressive Search for Pareto-Optimal Neural Architectures

Profiling Neural Blocks and Design Spaces for Mobile Neural Architecture Search

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

S3NAS: Fast NPU-aware Neural Architecture Search Methodology

Latency-aware Neural Architecture Performance Predictor with Query-to-Tier Technique

MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices

U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

DLW-NAS: Differentiable Light-Weight Neural Architecture Search

NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization

NASH: Neural Architecture Search for Hardware-Optimized Machine Learning Models

Search-time Efficient Device Constraints-Aware Neural Architecture Search

Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search

Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces

Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

Block Proposal Neural Architecture Search

Toward Fast Platform-Aware Neural Architecture Search for FPGA-Accelerated Edge AI Applications

Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization

Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning.