A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism

Mu, Jianhong
DOI: https://doi.org/10.1007/s11554-024-01562-1
IF: 2.293
2024-10-11
Journal of Real-Time Image Processing
Abstract:A novel detection method is proposed to address the challenge of detecting small objects in object detection. This method augments the YOLOv8n architecture with a small object detection layer and innovatively designs a Concat-detection head to effectively extract features. Simultaneously, a new attention mechanism—Multi-Head Mixed Self-Attention (MMSA) mechanism—is introduced to enhance the feature-extraction capability of the backbone. To improve the detection sensitivity for small objects, a combination of Normalized Wasserstein Distance (NWD) and Intersection over Union (IoU) is used to calculate the localization loss, optimizing the bounding-box regression. Experimental results on the TT100K dataset show that the mean average precision (mAP@0.5) reaches 88.1%, which is a 13.5% improvement over YOLOv8n. The method's versatility is also validated through experiments on the BDD100K dataset, where it is compared with various object-detection algorithms. The results demonstrate that this method yields significant improvements and practical value in the field of small-object detection. Detailed code can be found at https://github.com/CodeSworder/MMSA.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?