Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Samer Francy,Raghubir Singh

2024-09-02

Abstract:This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and computational complexity while maintaining accuracy. The experiments, conducted on cloud-based platforms and edge device, assess the performance of these techniques. Results show significant reductions in model size, with up to 75% reduction achieved using structured pruning techniques. Additionally, dynamic quantization achieves a reduction of up to 95% in the number of parameters. Fine-tuned models exhibit improved compression performance, indicating the benefits of pre-training in conjunction with compression techniques. Unstructured pruning methods reveal trends in accuracy and compression, with limited reductions in computational complexity. The combination of OTOV3 pruning and dynamic quantization further enhances compression performance, resulting 89.7% reduction in size, 95% reduction with number of parameters and MACs, and 3.8% increase with accuracy. The deployment of the final compressed model on edge device demonstrates high accuracy 92.5% and low inference time 20 ms, validating the effectiveness of compression techniques for real-world edge computing applications.

Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main problem this paper attempts to address is reducing the model size and computational complexity of convolutional neural networks (ConvNeXt) in image classification tasks by evaluating different model compression techniques (such as structured pruning, unstructured pruning, and dynamic quantization) while maintaining model accuracy. Specifically, the paper focuses on the following aspects: 1. **Evaluation of model compression techniques**: The paper evaluates the effectiveness of methods such as structured pruning, unstructured pruning, and dynamic quantization in reducing model size and computational complexity. 2. **Maintaining model performance**: Ensuring that the model's accuracy in image classification tasks does not significantly decrease while performing model compression. 3. **Feasibility of practical application**: Verifying the deployment effectiveness of the compressed model on edge devices, including high accuracy and low inference time. Through this research, the paper aims to provide effective solutions for efficient model deployment in edge computing environments, enabling complex deep learning models to run on resource-constrained devices, thereby achieving advantages such as low latency, data privacy protection, and bandwidth optimization.

Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks

Single-shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

A Compression Pipeline for One-Stage Object Detection Model

Pruning by Training: A Novel Deep Neural Network Compression Framework for Image Processing.

Pruning at a Glance: Global Neural Pruning for Model Compression

Learning Low Resource Consumption CNN through Pruning and Quantization

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Towards Optimal Compression: Joint Pruning and Quantization

Quantisation and Pruning for Neural Network Compression and Regularisation

Downscaling and Overflow-aware Model Compression for Efficient Vision Processors

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Pruning and quantization for deep neural network acceleration: A survey

Towards Hardware-Specific Automatic Compression of Neural Networks

Deep learning model compression using network sensitivity and gradients

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning

Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models

CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization

Model Compression for Deep Neural Networks: A Survey