Abstract:Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this paper, we propose a new framework, namely, DeNoising FPN with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans R-CNN. Specifically, feature fusion in the feature pyramid network is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. Experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of APvt on the AI-TOD dataset and 9.6% in terms of AP on the VisDrone dataset, respectively. Our code will be available at <a class="link-external link-https" href="https://github.com/hoiliu-0801/DNTR" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the **noise problem in tiny object detection**, especially in the fields of remote sensing and geoscience. Although significant progress has been made in the field of computer vision, the accurate detection of tiny objects remains a major challenge. The main reason is that these objects occupy very few pixels in the image data, resulting in insufficient feature representation. Specifically, the paper points out the following problems: 1. **Noise generation in Feature Pyramid Network (FPN)**: - In the traditional FPN, when the low - resolution feature maps provided by the bottom - up path are fused with the high - resolution feature maps provided by the top - down path, noise may be introduced. For example, the channel - reduction operation (1x1 convolution) will destroy geometric information, and the up - sampling operations (such as bilinear interpolation and nearest - neighbor interpolation) will introduce noise in semantic information. 2. **Insensitivity of two - stage detectors to tiny objects**: - Although traditional two - stage detectors (such as R - CNN) perform well in localization and recognition, they are not effective in dealing with tiny objects because the pixel representation of tiny objects is insufficient, making it difficult to detect accurately. 3. **Existing methods fail to suppress noise from the source**: - Most of the existing improvement methods focus on enhancing feature fusion through additional modules or adjusting label assignment strategies, but these methods do not fundamentally solve the noise problem generated in the FPN fusion process. To solve these problems, the author proposes a new framework - **Denoising FPN and Transformer R - CNN (DNTR)**, which specifically includes two main modules: - **Denoising FPN (DN - FPN)**: Utilize contrastive learning to reduce the noise in the FPN fusion process and ensure the integrity of geometric and semantic information. - **Transformer R - CNN (Trans R - CNN)**: Based on the two - stage framework, replace the traditional R - CNN detector with Transformer to better capture the local and global information of tiny objects through the self - attention mechanism. Through these two modules, DNTR can significantly improve the performance of tiny object detection while maintaining high efficiency. The experimental results show that DNTR outperforms the existing baseline models on the AI - TOD and VisDrone datasets, with an improvement of 17.4% and 9.6% in AP (Average Precision) respectively.

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

DroneNet: Rescue Drone-View Object Detection

High-Resolution Network with Transformer Embedding Parallel Detection for Small Object Detection in Optical Remote Sensing Images

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer

Robust Tiny Object Detection in Aerial Images amidst Label Noise

Deconv R-Cnn For Small Object Detection On Remote Sensing Images

Feature Rescaling and Fusion for Tiny Object Detection

Remote Sensing Object Detection Based on Receptive Field Expansion Block

Tiny Object Detection in Remote Sensing Images Based on Object Reconstruction and Multiple Receptive Field Adaptive Feature Enhancement

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Transformed Dynamic Feature Pyramid for Small Object Detection

Regression-Guided Refocusing Learning With Feature Alignment for Remote Sensing Tiny Object Detection

AeroDetectNet: A Lightweight, High-Precision Network for Enhanced Detection of Small Objects in Aerial Remote Sensing Imagery.

MMPW-Net: Detection of Tiny Objects in Aerial Imagery Using Mixed Minimum Point-Wasserstein Distance

Effective Fusion Factor in FPN for Tiny Object Detection

Remote Sensing Object Detection Based on Strong Feature Extraction and Prescreening Network

SODCNN: A Convolutional Neural Network Model for Small Object Detection in Drone-Captured Images

R$^2$-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images