Abstract:Convolutional neural networks (CNNs) have demonstrated encouraging results in image classification tasks. However, the prohibitive computational cost of CNNs hinders the deployment of CNNs onto resource-constrained embedded devices. To address this issue, we propose EdgeCompress, a comprehensive compression framework to reduce the computational overhead of CNNs. In EdgeCompress, we first introduce dynamic image cropping (DIC), where we design a lightweight foreground predictor to accurately crop the most informative foreground object of input images for inference, which avoids redundant computation on background regions. Subsequently, we present compound shrinking (CS) to collaboratively compress the three dimensions (depth, width, and resolution) of CNNs according to their contribution to accuracy and model computation. DIC and CS together constitute a multidimensional CNN compression framework, which is able to comprehensively reduce the computational redundancy in both input images and neural network architectures, thereby improving the inference efficiency of CNNs. Further, we present a dynamic inference framework to efficiently process input images with different recognition difficulties, where we cascade multiple models with different complexities from our compression framework and dynamically adopt different models for different input images, which further compresses the computational redundancy and improves the inference efficiency of CNNs, facilitating the deployment of advanced CNNs onto embedded hardware. Experiments on ImageNet-1K demonstrate that EdgeCompress reduces the computation of ResNet-50 by 48.8% while improving the top-1 accuracy by 0.8%. Meanwhile, we improve the accuracy by 4.1% with similar computation compared to HRank. The state-of-the-art compression framework. The source code and models are available at https://github.com/ntuliuteam/edge-compress .

Puppet-CNN: Input-Adaptive Convolutional Neural Networks with Model Compression using Ordinary Differential Equation

TEC-CNN: Towards Efficient Compressing Convolutional Neural Nets with Low-rank Tensor Decomposition

A Model Compression Method Using Significant Data and Knowledge Distillation

Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks

Sensitivity-Oriented Layer-Wise Acceleration and Compression for Convolutional Neural Network.

EdgeCompress: Coupling Multidimensional Model Compression and Dynamic Inference for EdgeAI

Initialization of CNN Models for Training on a Small Dataset Using Importance of Filter Parameters

Layer-Wise Training To Create Efficient Convolutional Neural Networks

Holistic CNN Compression Via Low-Rank Decomposition with Knowledge Transfer.

Towards Evolutionary Compression.

Deep Convolutional Neural Networks Compression Method Based on Linear Representation of Kernels

Model Parallelism Optimization for Distributed Inference Via Decoupled CNN Structure

Low-Cost Parameterizations of Deep Convolutional Neural Networks

Aggregated Squeeze-and-excitation Transformations for Densely Connected Convolutional Networks

Learning Efficient Convolutional Networks Through Network Slimming.

Optimizing Convolutional Neural Network Architecture

A Survey of Model Compression and Acceleration for Deep Neural Networks.

ASKs: Convolution with Any-Shape Kernels for Efficient Neural Networks

An Efficient Model Compression Method for CNN Based Object Detection

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency

Smart Scissor: Coupling Spatial Redundancy Reduction and CNN Compression for Embedded Hardware.