Abstract:The fine-grained object detection, capable of identifying subcategories or types, is thriving in remote sensing scenes. In practice, most existing fine-grained detectors are derived from the two-stage R-CNN paradigm with intricate anchor boxes, focusing on refining features of region of interest (RoI) to boost performance, which often incurs a redundant process. In contrast, the one-stage, anchor-free paradigm possesses a simple yet effective pipeline, but its exploration in fine-grained detections is still far from sufficient. In this article, we propose a one-stage, anchor-free fine-grained detector for remote sensing aircraft recognition. We initially delve into predominant issues when extending the one-stage framework to conduct fine-grained detections, typified by severe interclass confusion and inferior performance in rare categories. Then, we design a fine-grained classification branch, including a region-to-region context distributor (R2CD), a class-aware decoupled focal loss (CDFL), and a cross-shaped sample space (CS3), to address these hindrances. Specifically, the R2CD flexibly integrates the sparse attention mechanism with mask prediction operations to conduct region-level content interactions separately within the foreground and background of feature maps, significantly alleviating the interclass confusion by enhancing subtle features; the CDFL employs dynamic modulation factors driven by optimization gradients to regulate loss contributions across categories while optimizing category-specific heatmaps, thus prioritizing rare categories with hard samples; the CS3 attains a preferable assignment strategy of positive and negative samples by incorporating structure prior, facilitating the capture of foreground features. Extensive experiments conducted on the MAR20 and FAIRPlane11 datasets demonstrate that our model excels at distinguishing fine-grained categories and is well-suited for performing fine-grained detection tasks.

Context-Aware Aerial Object Detection: Leveraging Inter-Object and Background Relationships

AODet: Aerial Object Detection Using Transformers for Foreground Regions

Learning RoI Transformer for Oriented Object Detection in Aerial Images

DFS-DETR: Detailed-Feature-Sensitive Detector for Small Object Detection in Aerial Images Using Transformer

Learning RoI Transformer for Detecting Oriented Objects in Aerial Images

Oriented Object Detection via Contextual Dependence Mining and Penalty-Incentive Allocation

YOLO-DCTI: Small Object Detection in Remote Sensing Base on Contextual Transformer Enhancement

Oriented RepPoints for Aerial Object Detection

MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle Imagery

Affinity-Aware Relation Network for Oriented Object Detection in Aerial Images

AeroDetectNet: A Lightweight, High-Precision Network for Enhanced Detection of Small Objects in Aerial Remote Sensing Imagery.

Aerial Image Object Detection With Vision Transformer Detector (ViTDet)

Object Detection via Aspect Ratio and Context Aware Region-based Convolutional Networks

ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection

AF-DETR: efficient UAV small object detector via Assemble-and-Fusion mechanism

Context-aware and Semantic-consistent Spatial Interactions for One-shot Object Detection without Fine-tuning

Context-Aware Content Interaction: Grasp Subtle Clues for Fine-Grained Aircraft Detection

Bridging the Gap Between Object Detection in Close-Up and High-Resolution Wide Shots

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

Boundary-aware Small Object Detection with Attention and Interaction

Object Detection for Aerial Images With Feature Enhancement and Soft Label Assignment