Abstract:In recent years, deep neural networks (DNN) have attracted increasing attention because of their excellent performance in computer vision and natural language processing. The success of deep learning is due to the fact that the models have more layers and more parameters, which gives them stronger nonlinear fitting ability. Furthermore, the continuous updating of hardware equipment makes it possible to quickly train deep learning models. The development of deep learning is driven by the greater amounts of available annotated or unannotated data. Specifically, large-scale data provide models with greater learning space and stronger generalization ability. Although the performance of deep neural networks is significant, they are difficult to deploy in embedded or mobile devices with limited hardware due to their large number of parameters and high storage and computing costs. Recent studies have found that deep models based on a convolutional neural network are characterized by parameter redundancy as well as parameters that are irrelevant to the final model results, which provides theoretical support for the compression of deep network models. Therefore, determining ways to reduce model size while retaining model precision has become a hot research issue. Model compression refers to the reduction of a trained model through some operation to obtain a lightweight network with equivalent performance. After model compression, there are fewer network parameters and usually a reduction in the computation required, which greatly reduces the computational and storage costs and enables the deployment of the model in restricted hardware conditions. In this paper, the achievements and progress made in recent years by domestic and foreign scholars with respect to model compressionwere classified and summarized and their advantages and disadvantages were evaluated, including network pruning, parameter sharing, quantization, network decomposition, and network distillation. Then, existing problems and the future development of model compression were discussed.

Analysis of Model Compression Using Knowledge Distillation

A Model Compression Method Using Significant Data and Knowledge Distillation

Improved Model Compression Method Based on Information Entropy

Deep Model Compression for Mobile Platforms: A Survey

On-Demand Deep Model Compression for Mobile Devices

A Survey of Model Compression for Deep Neural Networks

Model Compression Algorithm Via Reinforcement Learning and Knowledge Distillation

A CNN Compression Method Via Dynamic Channel Ranking Strategy

Few Sample Knowledge Distillation for Efficient Network Compression

Deep Learning Model Compression with Rank Reduction in Tensor Decomposition.

Model Compression for Deep Neural Networks: A Survey

A Novel Deep Learning Model Compression Algorithm

Model Compression Using Optimal Transport

Model Compression via Collaborative Data-Free Knowledge Distillation for Edge Intelligence.

A Survey of Model Compression and Acceleration for Deep Neural Networks.

Deep learning model compression using network sensitivity and gradients

Model Selection - Knowledge Distillation Framework for Model Compression

Estimation-Based Strategy Generation for Deep Neural Network Model Compression

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Survey of Deep Neural Networks Model Compression

Using Distillation to Improve Network Performance after Pruning and Quantization