Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

Haoran Zhu,Yifan Zhou,Chang Xu,Ruixiang Zhang,Wen Yang
2024-07-25
Abstract:Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise orthogonal vector basis. This effectively mitigates semantic confusion and enhances classification accuracy. Moreover, OM can be seamlessly integrated into mainstream object detectors. Extensive experiments conducted on three FGOD datasets (FAIR1M, ShipRSImageNet, and MAR20) demonstrate the effectiveness and superiority of the proposed approach. Notably, with just one line of code, OM achieves a 4.08% improvement in mean Average Precision (mAP) over FCOS on the ShipRSImageNet dataset. Codes are released at <a class="link-external link-https" href="https://github.com/ZhuHaoranEIS/Orthogonal-FGOD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly addresses the issue of Fine-Grained Object Detection (FGOD) in high-resolution aerial images, particularly focusing on how to alleviate the common problem of semantic confusion in such tasks. The proposed method is called Orthogonal Mapping (OM), which is a simple yet effective approach aimed at mitigating semantic confusion by mapping features of different categories into orthogonal spaces. Specifically, the OM method introduces an orthogonal constraint in the last layer of the classification branch, achieving this goal by decoupling features with a category-specific orthogonal vector basis. This method effectively improves classification accuracy and can be seamlessly integrated into mainstream object detectors. The effectiveness and superiority of the OM method have been validated through extensive experiments on three fine-grained object detection datasets (FAIR1M, ShipRSImageNet, and MAR20). Particularly on the ShipRSImageNet dataset, OM can improve the average precision of the FCOS detector by 4.08% with just a single line of code modification. Additionally, the paper compares and analyzes the performance of OM with several other orthogonal loss methods, showing that OM performs better in terms of accuracy, the number of network parameters, and computational complexity. Finally, visualization experiments further demonstrate that OM can effectively alleviate the problem of semantic confusion.