OB-YOLO: A UAV Image Detection Model for Reducing Computational Resource Consumption

Rui Liu,Yuan Zhu,Yanqiang Wang,Zhecong Xing
DOI: https://doi.org/10.1109/ijcnn60899.2024.10649954
2024-01-01
Abstract:In contemporary society, the pervasive integration of Unmanned Aerial Vehicles (UAVs) in everyday activities is notable. Object detection emerges as a pivotal task within the UAV operational context. However, challenges such as the presence of expansive backgrounds in UAV images, insufficient target pixel resolution, and the prevalence of image interferences contribute to the diminished accuracy observed in existing object detection models tailored for UAV aerial imagery. Conventional strategies employed to enhance accuracy often incur exorbitant computational costs, failing to strike a harmonious balance between precision improvement and computational resource utilization. To address these challenges, this paper introduces an optimized variant of YOLOv8, denoted as OB-YOLO, specifically tailored for UAV aerial photography scenarios. The proposed model exhibits enhanced accuracy while concurrently mitigating parameter and floating-point operation costs. Particularly, the BiFPN concept is incorporated to fortify the feature fusion process, enabling comprehensive consideration and reuse of multi-scale feature fusion within the model. Additionally, the study integrates the full-dimensional dynamic convolution (ODConv) structure to replace the ordinary convolution within residual networks in the C2f module of the backbone network. This augmentation not only enhances the model’s feature extraction capabilities but also significantly reduces both the number of model parameters and computational workload through the parallel implementation of ODConv, coupled with the simultaneous introduction of the multidimensional attention mechanism. Furthermore, InnerIoU is employed for computing Intersection over Union (IoU) loss using auxiliary edges, and MPDIoU is integrated to expedite convergence speed and enhance accuracy. The confluence of these methodologies, enriched by the incorporation of the minimum point distance in MPDIoU, collectively contributes to superior detection performance. The proposed algorithm is systematically compared and evaluated on the extensively utilized VisDrone2019 dataset. The results demonstrate that OB-YOLO surpasses the YOLOv8 baseline model by 4.8% on the VisDrone2019-DET dataset, showcasing improved performance while concurrently reducing both network parameter and floating-point calculations.
What problem does this paper attempt to address?