FE-YOLOv5: Improved YOLOv5 Network for Multi-scale Drone-Captured Scene Detection

Chen Zhao,Zhe Yan,Zhiyan Dong,Dingkang Yang,Lihua Zhang
DOI: https://doi.org/10.1007/978-981-99-8082-6_23
2024-01-01
Abstract:Due to the different angles and heights of UAV shooting, the shooting environment is complex, and the shooting targets are mostly small, so the target detection task in the drone-captured scene is still challenging. In this study, we present a highly precise technique for identifying objects in scenes captured by drones, which we refer to as FE-YOLOv5. First, to optimize cross-scale feature fusion and maximize the utilization of shallow feature information, we propose a novel feature pyramid model called MSF-BiFPN as our primary approach. Furthermore, to improve the fusion of features at different scales and boost their representational power, our innovative approach proposes an adaptive attention module. Moreover, we propose a novel feature enhancement module that effectively strengthens high-level features before feature fusion. This module effectively minimized feature loss during the fusion process, ultimately resulting in enhanced detection accuracy. Finally, the utilization of the normalizedWasserstein distance serves as a novel metric for enhancing the model's sensitivity and accuracy in detecting small targets. The experimental results of FE-YOLOv5 on the VisDrone data set show that mAP 0.5 has increased by 7.8%, and mAP 0.5:0.95 increased by 5.7%. At the same time, the training results of the model at 960x960 image resolution are better than the current YOLO series models, among which mAP 0.5 can reach 56.3%. Based on the experiments conducted, it has been demonstrated that the FE-YOLOv5 model effectively enhances the accuracy of object detection in UAV capture scenes.
What problem does this paper attempt to address?