Abstract:Unmanned aerial vehicle object detection (UAV-OD) has been widely used in various scenarios. However, most existing UAV-OD algorithms rely on manually designed components, which require extensive tuning. End-to-end models that do not depend on such manually designed components are mainly designed for natural images, which are less effective for UAV imagery. To address such challenges, this paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery, i.e., UAV-DETR. The framework includes a multi-scale feature fusion with frequency enhancement module, which captures both spatial and frequency information at different scales. In addition, a frequency-focused down-sampling module is presented to retain critical spatial details during down-sampling. A semantic alignment and calibration module is developed to align and fuse features from different fusion paths. Experimental results demonstrate the effectiveness and generalization of our approach across various UAV imagery datasets. On the VisDrone dataset, our method improves AP by 3.1\% and $\text{AP}_{50}$ by 4.2\% over the baseline. Similar enhancements are observed on the UAVVaste dataset. The project page: <a class="link-external link-https" href="https://github.com/ValiantDiligent/UAV-DETR" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the object detection in Unmanned Aerial Vehicle (UAV) images (UAV - OD). Specifically, the paper mainly addresses the following problems: 1. **Existing UAV - OD algorithms rely on manually - designed components**: - Most existing UAV - OD algorithms rely on manually - designed components, such as Non - Maximum Suppression (NMS) and anchor boxes generated based on human experience. These components require a large amount of parameter tuning, resulting in complexity and inefficiency. 2. **End - to - end models have poor performance on UAV images**: - Existing end - to - end object detection models (such as DETR) are mainly designed for natural images and perform poorly when processing UAV images, especially in small - object detection and occluded - object detection. 3. **Challenges specific to UAV images**: - The object features in UAV images are more complex, such as small - object size and occlusion, making traditional object detection methods difficult to work effectively. Therefore, a method that can better extract multi - scale features is needed to address these challenges. To solve the above problems, this paper proposes an efficient detection Transformer framework specifically designed for UAV images - **UAV - DETR**. This framework includes the following innovative modules: - **Multi - Scale Feature Fusion with Frequency Enhancement (MSFF - FE)**: By combining spatial - domain and frequency - domain information, it enhances the detection ability of small objects and occluded objects. - **Frequency - Focused Down - Sampling (FD)**: Preserves key spatial details during the down - sampling process. - **Semantic Alignment and Calibration (SAC)**: Aligns and fuses features from different fusion paths to improve detection performance. Through these improvements, UAV - DETR has achieved significant performance improvements on the VisDrone and UAVVaste datasets, especially in terms of Average Precision (AP) and AP50 metrics. In addition, this model also has real - time inference ability and is suitable for UAV object detection tasks in practical applications.

UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery

DroneNet: Rescue Drone-View Object Detection

AF-DETR: efficient UAV small object detector via Assemble-and-Fusion mechanism

DV-DETR: Improved UAV Aerial Small Target Detection Algorithm Based on RT-DETR

AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images

Drone-DETR: Efficient Small Object Detection for Remote Sensing Image Using Enhanced RT-DETR Model

MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle Imagery

DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer

APNet: Accurate Positioning Deformable Convolution for UAV Image Object Detection

End to end polysemantic cooperative mixed task trainer for UAV target detection

Object Detection for UAV Aerial Scenarios Based on Vectorized IOU

SRDD: a lightweight end-to-end object detection with transformer

Delving into Robust Object Detection from Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach

PTCDet: advanced UAV imagery target detection

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

AFE-YOLOv8: A Novel Object Detection Model for Unmanned Aerial Vehicle Scenes with Adaptive Feature Enhancement

Drone-TOOD: A Lightweight Task-Aligned Object Detection Algorithm for Vehicle Detection in UAV Images

Enhancing UAV Aerial Image Analysis: Integrating Advanced SAHI Techniques With Real-Time Detection Models on the VisDrone Dataset

Deformable Convolution-Guided Multiscale Feature Learning and Fusion for UAV Object Detection

Focus-Attention Approach in Optimizing DETR for Object Detection from High-Resolution Images

Lightweight UAV Object-Detection Method Based on Efficient Multidimensional Global Feature Adaptive Fusion and Knowledge Distillation