YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

Chun-Lin Ji,Tao Yu,Peng Gao,Fei Wang,Ru-Yue Yuan
DOI: https://doi.org/10.1007/s11554-024-01519-4
2024-07-29
Abstract:Object detection, a crucial aspect of computer vision, has seen significant advancements in accuracy and robustness. Despite these advancements, practical applications still face notable challenges, primarily the inaccurate detection or missed detection of small objects. In this paper, we propose YOLO-TLA, an advanced object detection model building on YOLOv5. We first introduce an additional detection layer for small objects in the neck network pyramid architecture, thereby producing a feature map of a larger scale to discern finer features of small objects. Further, we integrate the C3CrossCovn module into the backbone network. This module uses sliding window feature extraction, which effectively minimizes both computational demand and the number of parameters, rendering the model more compact. Additionally, we have incorporated a global attention mechanism into the backbone network. This mechanism combines the channel information with global information to create a weighted feature map. This feature map is tailored to highlight the attributes of the object of interest, while effectively ignoring irrelevant details. In comparison to the baseline YOLOv5s model, our newly developed YOLO-TLA model has shown considerable improvements on the MS COCO validation dataset, with increases of 4.6% in mAP@0.5 and 4% in mAP@0.5:0.95, all while keeping the model size compact at 9.49M parameters. Further extending these improvements to the YOLOv5m model, the enhanced version exhibited a 1.7% and 1.9% increase in mAP@0.5 and mAP@0.5:0.95, respectively, with a total of 27.53M parameters. These results validate the YOLO-TLA model's efficient and effective performance in small object detection, achieving high accuracy with fewer parameters and computational demands.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the issues of inaccuracy and missed detections in small object detection, as well as the high parameter count and computational demands of existing detection models on resource-constrained devices. Specifically: 1. **Challenges of Small Object Detection**: Although existing object detection methods have made significant progress in terms of accuracy and robustness, detecting small objects (such as pedestrians, animals, etc.) in practical applications remains a significant challenge. These models often struggle to accurately detect or recognize small objects, leading to a decline in detection performance. 2. **Model Complexity and Computational Demand**: Existing object detection models typically have a large number of parameters and high computational demands, making them difficult to deploy and run on resource-limited devices. Therefore, reducing the parameter count and computational demand while maintaining detection accuracy has become an important research direction. To address these issues, the paper proposes the YOLO-TLA model, which is an improvement based on YOLOv5. By introducing an additional small object detection layer, lightweight convolution modules (such as C3CrossCovn), and a global attention mechanism (GAM), the model improves small object detection performance while reducing model complexity and computational demand. Specific improvements include: - **Introduction of Small Object Detection Layer**: An additional detection layer is added in the neck network to generate larger scale feature maps, better capturing the detailed features of small objects. - **Lightweight Convolution Module**: The C3CrossCovn module is integrated, which effectively reduces computational demand and parameter count through sliding window feature extraction, making the model more compact. - **Global Attention Mechanism**: A global attention mechanism is introduced in the backbone network, combining channel information and global information to create weighted feature maps that highlight object features and ignore irrelevant details. These improvements enable the YOLO-TLA model to achieve significant performance enhancements on the MS COCO validation dataset while maintaining model lightweightness.