Multi-information Supervision in Optical Remote Sensing Images
WANG Jiabao,CHENG Gong,XIE Xingxing,YAO Yanqing,HAN Junwei
DOI: https://doi.org/10.11834/jrs.20211564
2023-01-01
Abstract:Oriented object detection is a basic task in the interpretation of high-resolution remote sensing images.Compared with general detectors,oriented detectors can locate instances with oriented bounding boxes,which are consistent with arbitrary-oriented ground truths in remote sensing images.Currently,oriented object detection has greatly progressed with the development of the convolutional neural network.However,this task is still challenging because of the extreme variation in object scales and arbitrary orientations.Most oriented detectors are evolved from horizontal detectors.They first generate horizontal proposals using the Region Proposal Network(RPN).Then,they classify these proposals into different categories and transform them into oriented bounding boxes.Despite their success,these detectors exploit only the annotations at the end of the network and do not fully utilize the angle and semantic information. This work proposes an Angle-based Region Proposal Network(ARPN),which learns the angle of objects and generates oriented proposals.The structure of ARPN is the same as that of RPN.However,for each proposal,instead of outputting four parameters for regression,ARPN generates five parameters,which are the center(x,y),shape(w,h),and angle(t).In the training,we first assign anchors with ground truths by the Intersection of Unions.Then,we directly supervise the ARPN with the shape and angle information of ground truths.We also propose a semantic branch to output image semantic results for utilizing the advantage of the semantic information.The semantic branch consists of two convolutional layers and is parallel with the detection head.We first assign objects to different scale levels according to their areas.Then,we create semantic labels in each scale and use them to supervise the semantic branch.With the semantic information supervision,the model will learn translation-variant features and improve accuracy.Moreover,the outputs of the semantic branch indicate the objectness in each place,which can filter out false positives of final predictions. We conduct comprehensive experiments on the DOTA dataset to validate the effectiveness of the proposed methods.In the data preparation,we first crop original images into 1024x1024 patches with the stride of 824.Compared with the baseline,the ARPN achieves a 2.2%increase in mAP,while the semantic branch contributes an additional 0.8%improvement in mAP.Finally,we combine both methods and achieve a 74.64%mAP,which is competitive with those obtained by other oriented object detectors.We visualize some results on the DOTA dataset.The results show that our method is highly effective for small objects and densely packed objects. We proposed ARPN and the semantic branch to utilize the multi-information in remote sensing images.The ARPN can directly generate oriented proposals,which can lead to better recall of oriented objects.The semantic branch increases the translation-variant property of the features.Experiments demonstrate the effectiveness of our method,which achieves a 74.64%mAP on the DOTA dataset.In the future works,we will focus on the model efficiency and the inference speed.