AF-DETR: efficient UAV small object detector via Assemble-and-Fusion mechanism

Lingfei Ren,Huan Lei,Zhongxu Li,Wenyuan Yang
DOI: https://doi.org/10.1007/s10044-024-01349-x
IF: 2.307
2024-10-13
Pattern Analysis and Applications
Abstract:With the rise of deep learning networks, object detection technologies for unmanned aerial vehicle (UAV) have demonstrated outstanding performance in many application scenarios. However, current small object detection approaches overwhelmingly disregard sparse feature interactions and global context modeling, resulting in incomplete utilization and even loss of semantic information of small objects. Therefore, this study provides an advanced Assemble-and-Fusion mechanism used in DEtection TRansformer (AF-DETR), in which the aggregated global semantics are allocated across layers to augment fine-grained feature learning for small instances. Meanwhile, an adaptive context broadcasting module is designed to effectively integrate contextual information in the decoder, thus ensuring accurate detection of small objects. First, the last four stage features selected from the backbone are sent into the intra-scale feature interaction module, which performs self-attention operation on feature map of the last scale. Second, a fixed fusion module aligns and aggregates multi-scale representations prior to dissemination across layers. Features of adjoining levels then undergo transformation and consolidation within convolutional module. Finally, an enhanced adaptive context broadcasting module is introduced within the decoding MLP to incorporate aggregated semantics into individual tokens for broadcasting contextual information. Our AF-DETR achieves 49.5 mAP50 and 29.5 mAP50-95 on VisDrone2021 dataset, and impressive mAP50 results of 67.7% and 70.7% are achieved under RGB and Infrared modalities on the DroneVehicle dataset respectively. Extensive evaluations manifest consistent performance gains attained by our approach over state-of-the-art methods under various metrics, validated across multiple UAV perception benchmarks containing small objects under practical complex conditions.
computer science, artificial intelligence
What problem does this paper attempt to address?