Deformable Convolution-Guided Multiscale Feature Learning and Fusion for UAV Object Detection

Ya Shi,Chenyi Wang,Shengjun Xu,Ming-Dong Yuan,Feixiang Liu,Lele Zhang
DOI: https://doi.org/10.1109/lgrs.2024.3362890
IF: 5.343
2024-02-16
IEEE Geoscience and Remote Sensing Letters
Abstract:Object detection (OD) in unmanned aerial vehicle (UAV) images faces many challenges, with diverse-scale objects and small objects being particularly prominent issues. To alleviate these challenges, we propose a novel multiscale feature learning and feature fusion network under the guidance of deformable convolution. First, a deformable convolution-guided feature learning (DCGFL) block is designed in the backbone to extract more effective multiscale features. The DCGFL block leverages the adaptability of deformable convolution to the shapes and scales of objects, akin to spatial attention. Moreover, it also employs channel attention to identify important feature maps. Hence, the proposed backbone possesses the functionality of spatial attention and channel attention. Second, in the neck, we devise a simple generalized feature pyramid network (SimpleGFPN) with several deformable convolution-guided feature fusion (DCGFF) blocks to fuse multiscale features. The proposed neck has cross-layer and cross-scale pathways, facilitating effective information exchange and fusion between shallow spatial and deep semantic features. Third, the Scylla-IoU (SIoU) loss is used to better model the bounding box regression loss. Finally, the experimental results on the VisDrone2021 and UAVDT datasets show that the proposed method outperforms the compared OD methods. In terms of mean average precision, we obtain 37.8% on VisDrone2021 and 18.5% on UAVDT.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?