CDNet: Object Detection Based on Cross-Level Aggregation and Deformable Attention for UAV Aerial Images

Tianxiang Huo,Zhenqi Liu,Shichao Zhang,Jiening Wu,Rui Yuan,Shukai Duan,Lidan Wang
DOI: https://doi.org/10.1007/s00371-024-03680-0
IF: 2.835
2024-01-01
The Visual Computer
Abstract:Object detection in unmanned aerial vehicle (UAV) imagery is a crucial task. However, the presence of densely populated small objects and significant scale and shape variations among objects in aerial images pose challenges for the detection task. To address these issues, this paper proposes a cross-level deformable feature aggregation network (CDNet). First, a high-resolution characterization enhancement with deep reduction (HCEDR) structure is designed to extract small object location details in high resolution while reducing redundant deep interference. Furthermore, a cross-level fusiform feature aggregation (CFFA) structure is proposed to fuse multi-scale cross-level feature information and dense small object spatial detail information. Moreover, to address the challenge of object shape variations caused by varying aerial viewpoints, a deformable attention bottleneck (DAB) module is designed to enhance the model’s boundary sensitivity for irregularly shaped objects in aerial scenes. Finally, a new bounding box loss function (inner-WIoU) is proposed, which not only mitigates the detrimental gradient contributions from extreme samples, but also adjusts the auxiliary bounding box dimensions to better fit the ground-truth object bounding boxes, consequently enhancing the model’s performance. To validate the model’s superiority, extensive experiments were conducted on the VisDrone2021 and TinyPerson datasets, achieving with mAP _50 improvements of 10.8 https://github.com/htxhuo/CDNet.
What problem does this paper attempt to address?