Abstract:With the rise of deep learning networks, object detection technologies for unmanned aerial vehicle (UAV) have demonstrated outstanding performance in many application scenarios. However, current small object detection approaches overwhelmingly disregard sparse feature interactions and global context modeling, resulting in incomplete utilization and even loss of semantic information of small objects. Therefore, this study provides an advanced Assemble-and-Fusion mechanism used in DEtection TRansformer (AF-DETR), in which the aggregated global semantics are allocated across layers to augment fine-grained feature learning for small instances. Meanwhile, an adaptive context broadcasting module is designed to effectively integrate contextual information in the decoder, thus ensuring accurate detection of small objects. First, the last four stage features selected from the backbone are sent into the intra-scale feature interaction module, which performs self-attention operation on feature map of the last scale. Second, a fixed fusion module aligns and aggregates multi-scale representations prior to dissemination across layers. Features of adjoining levels then undergo transformation and consolidation within convolutional module. Finally, an enhanced adaptive context broadcasting module is introduced within the decoding MLP to incorporate aggregated semantics into individual tokens for broadcasting contextual information. Our AF-DETR achieves 49.5 mAP50 and 29.5 mAP50-95 on VisDrone2021 dataset, and impressive mAP50 results of 67.7% and 70.7% are achieved under RGB and Infrared modalities on the DroneVehicle dataset respectively. Extensive evaluations manifest consistent performance gains attained by our approach over state-of-the-art methods under various metrics, validated across multiple UAV perception benchmarks containing small objects under practical complex conditions.

VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images

DroneNet: Rescue Drone-View Object Detection

A Training-time Friendly Network for Real-time Drone Detection.

Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model

DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR

AF-DETR: efficient UAV small object detector via Assemble-and-Fusion mechanism

Object Detection of Visdrone based on Attention Mechanism and FasterNet

DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer

Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Object detection of VisDrone by stronger feature extraction FasterRCNN

Object Detection for UAV Aerial Scenarios Based on Vectorized IOU

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

Focus-Attention Approach in Optimizing DETR for Object Detection from High-Resolution Images

AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

VisDrone-VID2019: The Vision Meets Drone Object Detection in Video Challenge Results

A Small Object Detection Method for Drone-Captured Images Based on Improved YOLOv7

VisDrone-DET2021: The Vision Meets Drone Object detection Challenge Results

VisDrone-DET2020: The Vision Meets Drone Object Detection in Image Challenge Results