A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

Hou-I Liu,Yu-Wen Tseng,Kai-Cheng Chang,Pin-Jyun Wang,Hong-Han Shuai,Wen-Huang Cheng
2024-06-15
Abstract:Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this paper, we propose a new framework, namely, DeNoising FPN with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans R-CNN. Specifically, feature fusion in the feature pyramid network is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. Experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of APvt on the AI-TOD dataset and 9.6% in terms of AP on the VisDrone dataset, respectively. Our code will be available at <a class="link-external link-https" href="https://github.com/hoiliu-0801/DNTR" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the **noise problem in tiny object detection**, especially in the fields of remote sensing and geoscience. Although significant progress has been made in the field of computer vision, the accurate detection of tiny objects remains a major challenge. The main reason is that these objects occupy very few pixels in the image data, resulting in insufficient feature representation. Specifically, the paper points out the following problems: 1. **Noise generation in Feature Pyramid Network (FPN)**: - In the traditional FPN, when the low - resolution feature maps provided by the bottom - up path are fused with the high - resolution feature maps provided by the top - down path, noise may be introduced. For example, the channel - reduction operation (1x1 convolution) will destroy geometric information, and the up - sampling operations (such as bilinear interpolation and nearest - neighbor interpolation) will introduce noise in semantic information. 2. **Insensitivity of two - stage detectors to tiny objects**: - Although traditional two - stage detectors (such as R - CNN) perform well in localization and recognition, they are not effective in dealing with tiny objects because the pixel representation of tiny objects is insufficient, making it difficult to detect accurately. 3. **Existing methods fail to suppress noise from the source**: - Most of the existing improvement methods focus on enhancing feature fusion through additional modules or adjusting label assignment strategies, but these methods do not fundamentally solve the noise problem generated in the FPN fusion process. To solve these problems, the author proposes a new framework - **Denoising FPN and Transformer R - CNN (DNTR)**, which specifically includes two main modules: - **Denoising FPN (DN - FPN)**: Utilize contrastive learning to reduce the noise in the FPN fusion process and ensure the integrity of geometric and semantic information. - **Transformer R - CNN (Trans R - CNN)**: Based on the two - stage framework, replace the traditional R - CNN detector with Transformer to better capture the local and global information of tiny objects through the self - attention mechanism. Through these two modules, DNTR can significantly improve the performance of tiny object detection while maintaining high efficiency. The experimental results show that DNTR outperforms the existing baseline models on the AI - TOD and VisDrone datasets, with an improvement of 17.4% and 9.6% in AP (Average Precision) respectively.