PETDet: Proposal Enhancement for Two-Stage Fine-Grained Object Detection

Wentao Li,Danpei Zhao,Bo Yuan,Yue Gao,Zhenwei Shi
DOI: https://doi.org/10.1109/TGRS.2023.3343453
2023-12-17
Abstract:Fine-grained object detection (FGOD) extends object detection with the capability of fine-grained recognition. In recent two-stage FGOD methods, the region proposal serves as a crucial link between detection and fine-grained recognition. However, current methods overlook that some proposal-related procedures inherited from general detection are not equally suitable for FGOD, limiting the multi-task learning from generation, representation, to utilization. In this paper, we present PETDet (Proposal Enhancement for Two-stage fine-grained object detection) to better handle the sub-tasks in two-stage FGOD methods. Firstly, an anchor-free Quality Oriented Proposal Network (QOPN) is proposed with dynamic label assignment and attention-based decomposition to generate high-quality oriented proposals. Additionally, we present a Bilinear Channel Fusion Network (BCFN) to extract independent and discriminative features of the proposals. Furthermore, we design a novel Adaptive Recognition Loss (ARL) which offers guidance for the R-CNN head to focus on high-quality proposals. Extensive experiments validate the effectiveness of PETDet. Quantitative analysis reveals that PETDet with ResNet50 reaches state-of-the-art performance on various FGOD datasets, including FAIR1M-v1.0 (42.96 AP), FAIR1M-v2.0 (48.81 AP), MAR20 (85.91 AP) and ShipRSImageNet (74.90 AP). The proposed method also achieves superior compatibility between accuracy and inference speed. Our code and models will be released at <a class="link-external link-https" href="https://github.com/canoe-Z/PETDet" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the challenges and limitations in Fine-grained Object Detection (FGOD) and proposes a new method called PETDet. Specifically, PETDet aims to solve the following issues: 1. **High-Quality Proposal Generation**: In two-stage FGOD methods, high-quality region proposals can reduce false positives and improve localization accuracy, which is crucial for subsequent task decomposition. However, existing methods often overlook this aspect. 2. **Representation Problem**: In two-stage FGOD detectors, the first stage is responsible for foreground/background classification and proposal localization, while the second stage handles fine-grained recognition and bounding box refinement. However, the features for both stages are extracted from the Feature Pyramid Network (FPN) without decoupling, leading to task confusion. Additionally, proposal representation based on single-level features is insufficient to support accurate fine-grained recognition in the second stage. 3. **Utilization Problem**: In previous two-stage methods, the R-CNN head receives proposals generated by the standard RPN as input, which contain a large number of false positives. Therefore, manual sampling of positive and negative samples is required to reduce imbalance. Even if the proposal quality is enhanced, high-quality positive samples cannot be fully utilized, which harms the learning of fine-grained recognition. To address the above challenges, the paper proposes PETDet, a two-stage FGOD method centered on a proposal enhancement strategy. PETDet includes three main components: - **Quality Oriented Proposal Network (QOPN)**: This is an anchor-free proposal network that generates high-quality oriented proposals through dynamic label assignment and attention decomposition. - **Bilinear Channel Fusion Network (BCFN)**: This network produces independent and discriminative features through cross-layer fusion to enhance proposal representation. - **Adaptive Recognition Loss (ARL)**: A novel loss function designed for the R-CNN head, guiding the head to focus on high-quality proposals while avoiding traditional operations like random sampling and non-maximum suppression, thereby maximizing sample utilization. PETDet achieves state-of-the-art performance on multiple FGOD datasets, including FAIR1M-v1.0, FAIR1M-v2.0, MAR20, and ShipRSImageNet, demonstrating the effectiveness and efficiency of the method.