DIMA: Digging into Multigranular Archetype for Fine-Grained Object Detection

Jiacheng Cheng,Xiwen Yao,Xuguang Yang,Xiang Yuan,Xiaoxu Feng,Gong Cheng,Xiankai Huang,Junwei Han
DOI: https://doi.org/10.1109/tgrs.2024.3415809
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Fine-grained remote sensing object detection aims at precisely locating objects and determining the fine-level categories. This task is exceptionally challenging due to the substantial interclass similarity, presenting difficulties in capturing discriminative features. We attribute this to the absence of essential information that can serve as supervision for the learning. This involves comprehensive visual patterns of objects and intrinsic relationships of multigranular features. In this article, we propose a novel scheme dubbed as digging into the multigranular archetype (DIMA) for fine-grained remote sensing object detection. In detail, we first design a simple yet effective frequency-aware representation supplement (FARS) mechanism learning from original images and their auxiliary frequency counterparts simultaneously. The FARS introduces high- and low-frequency representations to reinforce a range of visual cues, such as particular regions associated with the former and contours of objects related to the latter. Then, we further devise a module named hierarchical classification paradigm (HCP), which constructs the interhierarchy relationships between coarse and fine-level representations and then exploits them to guide fine-grained feature enhancement. HCP eventually selects and boosts samples that are hard to discriminate by keeping consistency in multilevels. Our method can be easily integrated into prevailing oriented object detectors and brings consistent performance improvements across these detectors. Notably, our method combined with oriented RCNN (ORCNN) achieves 44.44% (+3.62%) on the FAIR1M and 91.0% (+6.9%) on the MAR20. Moreover, thoughtful discussions about qualitative results and rich visualizations are provided to intuitively underscore the superiority of our approach. The source code is available at https://github.com/chengjc2019/DIMA.
What problem does this paper attempt to address?