MS-YOLO: integration-based multi-subnets neural network for object detection in aerial images

Xinyu Cao,Minglei Duan,Hongwei Ding,Zhijun Yang
DOI: https://doi.org/10.1007/s12145-024-01265-y
2024-03-14
Earth Science Informatics
Abstract:Aerial images is one of the most important application areas for object detection. Object detection in aerial images can be widely applied in various fields such as agriculture, environmental protection and security monitoring. However, the challenges of small object scales, dense biological distribution, and occlusion in aerial images increase the difficulty of detection. To address these issues, we introduce a more accurate and lightweight method called MS-YOLO. Our method restructures the network backbone into Multiple Subnetworks (Multi-Subnets), augmenting it with an additional dimension to facilitate the extraction of more subtle low-level information. Furthermore, Multiple Feature Dynamic Path Aggregation Network (MFDPANet) incorporates more detailed information, and the novel Dynamic Cross Stage Partial (DCSP) module is proposed to enhance sensitivity to the positions of tiny objects. Additionally, our specially crafted Multi-Scale Decoupled Head (MSD Head) enhances the model's classification and localization capabilities without incurring additional parameter size. Lastly, the integration of Wise-IoU-v2(WIoU-v2) effectively mitigates the model's overemphasis on extreme samples, leading to an overall performance enhancement. The proposed method is evaluated on three public datasets: VisDrone2019, AI-TOD, and DIOR. Our results demonstrate that MS-YOLO significantly surpasses baseline methods with equivalent parameter size in terms of object detection accuracy. In comparison to YOLOv8n, MS-YOLO-n exhibits remarkable improvements in the metric while maintaining superior parameter efficiency, resulting in a significant increase of 23.9% (38.3 vs. 30.9) on the VisDrone2019-val dataset and 22.3% (28.5 vs. 23.3) on the VisDrone2019-test dataset. As depicted in the Figure 1, we also designed models that perform better, meeting different needs.
geosciences, multidisciplinary,computer science, interdisciplinary applications
What problem does this paper attempt to address?