Abstract:In response to the current issues of poor real-time performance, high computational costs, and excessive memory usage of object detection algorithms based on deep convolutional neural networks in embedded devices, a method for improving deep convolutional neural networks based on model compression and knowledge distillation is proposed. Firstly, data augmentation is employed in the preprocessing stage to increase the diversity of training samples, thereby improving the model's robustness and generalization capability. The K-means++ clustering algorithm generates candidate bounding boxes, adapting to defects of different sizes and selecting finer features earlier. Secondly, the cross stage partial (CSP) Darknet53 network and spatial pyramid pooling (SPP) module extract features from the input raw images, enhancing the accuracy of defect location detection and recognition in YOLO. Finally, the concept of model compression is integrated, utilizing scaling factors in the batch normalization (BN) layer, and introducing sparse factors to perform sparse training on the network. Channel pruning and layer pruning are applied to the sparse model, and post-processing methods using knowledge distillation are used to effectively reduce the model size and forward inference time while maintaining model accuracy. The improved model size decreases from 244 M to 4.19 M, the detection speed increases from 32.8 f/s to 68 f/s, and mAP reaches 97.41. Experimental results demonstrate that this method is conducive to deploying network models on embedded devices with limited GPU computing and storage resources. It can be applied in distributed service architectures for edge computing, providing new technological references for deploying deep learning models in the industrial sector.

Design of a Novel Neural Network Compression Method for Tiny Machine Learning

Edge Segmentation: Empowering Mobile Telemedicine with Compressed Cellular Neural Networks

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region

Tiny Machine Learning: Progress and Futures

DNN Model Compression for IoT Domain-Specific Hardware Accelerators

TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit

UDC: Unified DNAS for Compressible TinyML Models

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

On-Device Training Under 256KB Memory

An Ultra-low Power TinyML System for Real-time Visual Processing at Edge

T3DNet: Compressing Point Cloud Models for Lightweight 3D Recognition

MCUNet: Tiny Deep Learning on IoT Devices

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

Research on the Construction of an Efficient and Lightweight Online Detection Method for Tiny Surface Defects through Model Compression and Knowledge Distillation

Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices

Optimizing Federated Learning on TinyML Devices for Privacy Protection and Energy Efficiency in IoT Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding