Abstract:Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression. Case studies for compression models are available at \href{<a class="link-external link-https" href="https://github.com/venturusbr/cv-model-compression" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/venturusbr/cv-model-compression" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to compress deep neural networks and deploy them in resource - constrained embedded systems**. Specifically, as the deep neural networks used in computer vision tasks become more and more complex and large (for example, from AlexNet with 62 million parameters in 2012 to AIM - 7B with 7 billion parameters in 2024), the deployment of these large - scale models on embedded devices with limited computing power, memory, and power consumption is facing challenges. To address this issue, the paper reviews the main model compression techniques for computer vision tasks, including: 1. **Knowledge Distillation**: - By transferring the knowledge of a large teacher model to a small student model, the student model can reduce the number of parameters while maintaining high performance. - Formula representation: \[ L_{KD}=-\sum_{i} \sigma_{i}\left(\frac{z_{t}}{T}\right) \times \log \sigma_{i}\left(\frac{z_{s}}{T}\right) \] where \( z_{t} \) and \( z_{s} \) are the outputs of the teacher model and the student model respectively, and \( T \) is the temperature parameter that controls the smoothness of the probability distribution. 2. **Network Pruning**: - By removing unimportant weights or structures (such as filters, channels, etc.) in the neural network, the size and inference time of the model are reduced. - Pruning can be divided into unstructured pruning (only pruning individual weights) and structured pruning (pruning entire filters or channels). 3. **Network Quantization**: - Convert the network parameters represented by floating - point numbers into low - precision representations (such as 8 - bit integers or binary values), thereby reducing memory usage and increasing inference speed. - Example of the quantization process: \[ w_{quantized}=round\left(\frac{w_{float}}{\Delta}\right) \] where \( w_{float} \) is the original floating - point weight and \( \Delta \) is the quantization step size. 4. **Low - Rank Matrix Factorization**: - By performing matrix factorization on network parameters, the number of parameters is reduced, but this method is less applied in computer vision. The paper also discusses how to select the most suitable technique and analyzes the performance differences of different compression techniques on various embedded devices. In addition, the author provides code examples to help researchers and novices overcome the initial challenges in implementing these techniques.

Computer Vision Model Compression Techniques for Embedded Systems: A Survey

Multi-Dimension Compression of Feed-Forward Network in Vision Transformers

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Survey on Energy-Efficient Deep Neural Networks for Computer Vision

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

Model Compression for Deep Neural Networks: A Survey

Computation-efficient Deep Learning for Computer Vision: A Survey

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

A Survey of Model Compression and Acceleration for Deep Neural Networks.

End-to-end Compression Towards Machine Vision: Network Architecture Design and Optimization

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

Model Compression for Resource-Constrained Mobile Robots

Model compression techniques in biometrics applications: A survey

Deep Learning Model Compression Techniques: Advances, Opportunities, and Perspective

VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch

UCC: A Unified Cascade Compression Framework for Vision Transformer Models

A survey of model compression strategies for object detection

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Vision transformer models for mobile/edge devices: a survey