Lightweight Object Detection: A Study Based on YOLOv7 Integrated with ShuffleNetv2 and Vision Transformer

Wenkai Gong
2024-03-04
Abstract:As mobile computing technology rapidly evolves, deploying efficient object detection algorithms on mobile devices emerges as a pivotal research area in computer vision. This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms while ensuring high accuracy. Leveraging a synergy of advanced techniques such as Group Convolution, ShuffleNetV2, and Vision Transformer, this research has effectively minimized the model's parameter count and memory usage, streamlined the network architecture, and fortified the real-time object detection proficiency on resource-constrained devices. The experimental outcomes reveal that the refined YOLO model demonstrates exceptional performance, markedly enhancing processing velocity while sustaining superior detection accuracy.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main goal of this paper is to optimize the YOLOv7 algorithm to improve its operational efficiency and speed on mobile devices while ensuring high-precision detection. Specifically, the research addresses the problem through the following points: 1. **Network Structure Optimization**: By combining ShuffleNetv2, Group Convolution, and Vision Transformer, the network architecture is simplified, and the number of model parameters and memory usage are reduced. 2. **Model Compression and Acceleration**: A lightweight model is designed to achieve efficient algorithms and optimized for specific hardware to enhance the performance and efficiency of the model on resource-constrained devices. 3. **Robustness Enhancement**: By introducing techniques such as skip connections and depthwise separable convolutions, the robustness and accuracy of the model are further enhanced. 4. **Performance Evaluation in Different Application Scenarios**: The improved model is validated on standard datasets and tested in actual mobile device environments to ensure that it not only shows theoretical progress but also performs excellently in practical applications. In summary, this paper aims to address the issues of insufficient computing power, memory limitations, and energy consumption faced by traditional object detection methods on mobile devices. By integrating various advanced technologies, the YOLOv7 model is optimized to be more suitable for mobile deployment.