Abstract:While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not leverage the gap between accuracy and inference speed. To address challenges, we propose methods enhancing sampling within an end-to-end framework. Sample Points Refinement (SPR) constrains localization and attention, preserving meaningful interactions in the region of interest and filtering out misleading information. Scale-aligned Target (ST) integrates scale information into target confidence, improving classification for small object detection. A task-decoupled Sample Reweighting (SR) mechanism guides attention toward challenging positive examples, utilizing a weight generator module to assess the difficulty and adjust classification loss based on decoder layer outcomes. Comprehensive experiments across various benchmarks reveal that our proposed detector excels in detecting small objects. Our model demonstrates a significant enhancement, achieving a 2.9\% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset and a 1.7\% improvement on the SODA-D dataset.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problems of effectiveness and efficiency in small - object detection. Although general object detection based on deep learning has made remarkable progress in recent years, the effectiveness and efficiency of small - object detection are still not satisfactory. This is mainly due to the limited features of small objects and the high density and mutual overlap among these objects. The existing Transformer - based small - object detectors have a gap between accuracy and inference speed. To meet these challenges, the authors propose a strategy to enhance the sampling method in an end - to - end framework. ### Main contributions 1. **Sample Points Refinement (SPR)**: - By constraining the sample point distribution and attention in the deformable attention module, ensure that meaningful interactions within the region of interest are retained while filtering out possible misleading information. - Formula representation: \[ L_{\text{offset}}=\sum_{i = 1}^{N^{+}}\left[\sum_{m, l, k\in O_{ilm}}\left\|\Delta p_{milk}-\eta\begin{bmatrix}w_{i}\\h_{i}\end{bmatrix}\right\|_{1}^{2}\right] \] \[ L_{\text{atten}}=\sum_{i = 1}^{N^{+}}\sum_{m, l, k\in O_{ilm}}\max(A_{milk}-\max_{k\in I_{mil}}A_{milk}, 0) \] 2. **Scale - aligned Target (ST)**: - Integrate the scale information of samples into the target confidence to establish a more appropriate learning target, especially suitable for small - object detection. - Formula representation: \[ r = \sqrt{\rho} \] \[ v=e^{-\theta(r - 1)^{2}} \] \[ c = u^{\beta}\cdot v^{1-\beta} \] \[ t = c\cdot s \] 3. **Task - decoupled Sample Reweighting (SR)**: - By guiding the model to refocus on learning from difficult positive samples, use the weight generation module to evaluate the difficulty and adjust the classification loss according to the decoding - layer results. - Formula representation: \[ A_{i}=\sigma(B(\text{cov}(\text{cat}(H_{\text{reg}}^{i}, H_{\text{cls}}^{i})))) \] \[ w_{\text{cls}}^{i}=\sigma(B(\text{cov}(H_{\text{cls}}^{i}\otimes\text{cov}(A_{i})))) \] \[ w_{\text{reg}}^{i}=\sigma(B(\text{cov}(H_{\text{reg}}^{i}\otimes\text{cov}(A_{i})))) \] \[ r = w^{1-\vert t - s\vert} \] \[ L_{\text{cls}}=\sum_{i = 1}^{N^{+}}r_{\text{cls}}^{i}\text{BCE}(s_{i}, t_{i})+\alpha\sum_{i = 1}^{N^{-}}p_{i}^{\gamma}\text{tex}

Better Sampling, towards Better End-to-end Small Object Detection

A Recursive Prediction-Based Feature Enhancement for Small Object Detection

Small object detection based on attention mechanism and enhanced network

ESOD: Efficient Small Object Detection on High-Resolution Images

Small object detection leveraging density‐aware scale adaptation

A Small Object Detection Method for Drone-Captured Images Based on Improved YOLOv7

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Boundary-aware Small Object Detection with Attention and Interaction

Scale-Adaptive Salience Supervision and Dynamic Token Filtering for Small Object Detection in Remote Sensing Images

An Effective Method for Small Object Detection in Low-Resolution Images

SODCNN: A Convolutional Neural Network Model for Small Object Detection in Drone-Captured Images

Tiny object detection with context enhancement and feature purification

Multi-scale detector optimized for small target

Small Target-YOLOv5: Enhancing the Algorithm for Small Object Detection in Drone Aerial Imagery Based on YOLOv5

Small Object Detection using Multi-scale Feature Fusion and Attention

TranSDet: Toward Effective Transfer Learning for Small-Object Detection

A novel feature enhancement module for small object detection

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Feature Rescaling and Fusion for Tiny Object Detection

Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion

MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects