Abstract:Convolutional neural networks (CNNs) have demonstrated encouraging results in image classification tasks. However, the prohibitive computational cost of CNNs hinders the deployment of CNNs onto resource-constrained embedded devices. To address this issue, we propose EdgeCompress, a comprehensive compression framework to reduce the computational overhead of CNNs. In EdgeCompress, we first introduce dynamic image cropping (DIC), where we design a lightweight foreground predictor to accurately crop the most informative foreground object of input images for inference, which avoids redundant computation on background regions. Subsequently, we present compound shrinking (CS) to collaboratively compress the three dimensions (depth, width, and resolution) of CNNs according to their contribution to accuracy and model computation. DIC and CS together constitute a multidimensional CNN compression framework, which is able to comprehensively reduce the computational redundancy in both input images and neural network architectures, thereby improving the inference efficiency of CNNs. Further, we present a dynamic inference framework to efficiently process input images with different recognition difficulties, where we cascade multiple models with different complexities from our compression framework and dynamically adopt different models for different input images, which further compresses the computational redundancy and improves the inference efficiency of CNNs, facilitating the deployment of advanced CNNs onto embedded hardware. Experiments on ImageNet-1K demonstrate that EdgeCompress reduces the computation of ResNet-50 by 48.8% while improving the top-1 accuracy by 0.8%. Meanwhile, we improve the accuracy by 4.1% with similar computation compared to HRank. The state-of-the-art compression framework. The source code and models are available at https://github.com/ntuliuteam/edge-compress .

Adaptive Compression Offloading and Resource Allocation for Edge Vision Computing

Attention-based Feature Compression for CNN Inference Offloading in Edge Computing

Understanding Sensor Data Using Deep Learning Methods on Resource-Constrained Edge Devices.

Multi-stream Adaptive Offloading of Joint Compressed Video Streams, Feature Streams, and Semantic Streams in Edge Computing Systems.

Energy-Aware Inference Offloading for DNN-Driven Applications in Mobile Edge Clouds

EdgeCompress: Coupling Multidimensional Model Compression and Dynamic Inference for EdgeAI

Task-Oriented Communication for Edge Video Analytics

Energy-Efficient Joint Partitioning and Offloading for Delay-Sensitive CNN Inference in Edge Computing

Supervised Compression for Resource-Constrained Edge Computing Systems

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Efficient Deep Learning Approach for Computational Offloading in Mobile Edge Computing Networks

Hastening Stream Offloading of Inference Via Multi-Exit DNNs in Mobile Edge Computing

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Computation Offloading Toward Edge Computing

HRCache: Edge-End Collaboration for Mobile Deep Vision Based on H.264 and Approximated Reuse

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN

ADDA: Adaptive Distributed DNN Inference Acceleration in Edge Computing Environment

Reliable adaptive edge-cloud collaborative DNN inference acceleration scheme combining computing and communication resources in optical networks

EdgeEye: A Data-Driven Approach for Optimal Deployment of Edge Video Analytics

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

Joint Optimization of Task Offloading and Resource Allocation for Edge Video Analytics.