Better Sampling, towards Better End-to-end Small Object Detection

Zile Huang,Chong Zhang,Mingyu Jin,Fangyu Wu,Chengzhi Liu,Xiaobo Jin
2024-05-17
Abstract:While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not leverage the gap between accuracy and inference speed. To address challenges, we propose methods enhancing sampling within an end-to-end framework. Sample Points Refinement (SPR) constrains localization and attention, preserving meaningful interactions in the region of interest and filtering out misleading information. Scale-aligned Target (ST) integrates scale information into target confidence, improving classification for small object detection. A task-decoupled Sample Reweighting (SR) mechanism guides attention toward challenging positive examples, utilizing a weight generator module to assess the difficulty and adjust classification loss based on decoder layer outcomes. Comprehensive experiments across various benchmarks reveal that our proposed detector excels in detecting small objects. Our model demonstrates a significant enhancement, achieving a 2.9\% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset and a 1.7\% improvement on the SODA-D dataset.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problems of effectiveness and efficiency in small - object detection. Although general object detection based on deep learning has made remarkable progress in recent years, the effectiveness and efficiency of small - object detection are still not satisfactory. This is mainly due to the limited features of small objects and the high density and mutual overlap among these objects. The existing Transformer - based small - object detectors have a gap between accuracy and inference speed. To meet these challenges, the authors propose a strategy to enhance the sampling method in an end - to - end framework. ### Main contributions 1. **Sample Points Refinement (SPR)**: - By constraining the sample point distribution and attention in the deformable attention module, ensure that meaningful interactions within the region of interest are retained while filtering out possible misleading information. - Formula representation: \[ L_{\text{offset}}=\sum_{i = 1}^{N^{+}}\left[\sum_{m, l, k\in O_{ilm}}\left\|\Delta p_{milk}-\eta\begin{bmatrix}w_{i}\\h_{i}\end{bmatrix}\right\|_{1}^{2}\right] \] \[ L_{\text{atten}}=\sum_{i = 1}^{N^{+}}\sum_{m, l, k\in O_{ilm}}\max(A_{milk}-\max_{k\in I_{mil}}A_{milk}, 0) \] 2. **Scale - aligned Target (ST)**: - Integrate the scale information of samples into the target confidence to establish a more appropriate learning target, especially suitable for small - object detection. - Formula representation: \[ r = \sqrt{\rho} \] \[ v=e^{-\theta(r - 1)^{2}} \] \[ c = u^{\beta}\cdot v^{1-\beta} \] \[ t = c\cdot s \] 3. **Task - decoupled Sample Reweighting (SR)**: - By guiding the model to refocus on learning from difficult positive samples, use the weight generation module to evaluate the difficulty and adjust the classification loss according to the decoding - layer results. - Formula representation: \[ A_{i}=\sigma(B(\text{cov}(\text{cat}(H_{\text{reg}}^{i}, H_{\text{cls}}^{i})))) \] \[ w_{\text{cls}}^{i}=\sigma(B(\text{cov}(H_{\text{cls}}^{i}\otimes\text{cov}(A_{i})))) \] \[ w_{\text{reg}}^{i}=\sigma(B(\text{cov}(H_{\text{reg}}^{i}\otimes\text{cov}(A_{i})))) \] \[ r = w^{1-\vert t - s\vert} \] \[ L_{\text{cls}}=\sum_{i = 1}^{N^{+}}r_{\text{cls}}^{i}\text{BCE}(s_{i}, t_{i})+\alpha\sum_{i = 1}^{N^{-}}p_{i}^{\gamma}\text{tex}