Abstract:In recent years, deep neural networks (DNN) have attracted increasing attention because of their excellent performance in computer vision and natural language processing. The success of deep learning is due to the fact that the models have more layers and more parameters, which gives them stronger nonlinear fitting ability. Furthermore, the continuous updating of hardware equipment makes it possible to quickly train deep learning models. The development of deep learning is driven by the greater amounts of available annotated or unannotated data. Specifically, large-scale data provide models with greater learning space and stronger generalization ability. Although the performance of deep neural networks is significant, they are difficult to deploy in embedded or mobile devices with limited hardware due to their large number of parameters and high storage and computing costs. Recent studies have found that deep models based on a convolutional neural network are characterized by parameter redundancy as well as parameters that are irrelevant to the final model results, which provides theoretical support for the compression of deep network models. Therefore, determining ways to reduce model size while retaining model precision has become a hot research issue. Model compression refers to the reduction of a trained model through some operation to obtain a lightweight network with equivalent performance. After model compression, there are fewer network parameters and usually a reduction in the computation required, which greatly reduces the computational and storage costs and enables the deployment of the model in restricted hardware conditions. In this paper, the achievements and progress made in recent years by domestic and foreign scholars with respect to model compressionwere classified and summarized and their advantages and disadvantages were evaluated, including network pruning, parameter sharing, quantization, network decomposition, and network distillation. Then, existing problems and the future development of model compression were discussed.

On-Demand Deep Model Compression for Mobile Devices

Deep Model Compression for Mobile Platforms: A Survey

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

A New Compression Method for Deep Neural Networks with Accuracy Improvement

Understanding Sensor Data Using Deep Learning Methods on Resource-Constrained Edge Devices.

Model Compression for Deep Neural Networks: A Survey

A Survey of Model Compression for Deep Neural Networks

AdaSpring

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

Dynamic and Adaptive Threshold for DNN Compression from Scratch.

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms

A Robust Deep-Neural-Network-Based Compressed Model for Mobile Device Assisted by Edge Server

Pocketflow: An automated framework for compressing and accelerating deep neural networks

Efficient Deep Learning Inference Based on Model Compression.