Channel Self-Attention Based Multiscale Spatial-Frequency Domain Network for Oriented Object Detection in Remote Sensing Imagery

Yang Xu,Yushan Pan,Zebin Wu,Zhihui Wei,Tianming Zhan
DOI: https://doi.org/10.1109/tgrs.2024.3500013
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:The detection of oriented objects in remote sensing images remains a daunting challenge due to their complex backgrounds, various sizes, and especially arbitrary orientations. However, most of the existing methods only model the structural features of the images in the spatial domain, while the horizontal convolution kernels limit the model’s ability to perceive object direction information. Furthermore, the frequency features contain rich information about scale, texture and angle, which can be a good complement to the spatial features. Inspired by this, we propose a multiscale spatial-frequency domain network (MSFN) to utilize spatial-frequency information for oriented object detection, which can be integrated into any CNN architectures seamlessly and perform end-to-end training easily. Firstly, multiscale Haar wavelet transforms are leveraged to extract the multiscale frequency domain features from the image. Subsequently, channel alignment feature fusion module (CA-FFM) is proposed to fuse the high-level semantic features extracted by CNN with the low-level texture features extracted by the wavelet transform in multiscale. Finally, a channel self-attention based spatial-frequency feature perception module (SFPM) is designed to perform self-attention weighted aggregation on the fused features along the channel dimension, thereby constructing a novel spatial-frequency feature extraction backbone network for oriented object detector in remote sensing images. Experimental results on the DOTA and HRSC2016 datasets validate the effectiveness and universality of the proposed method.
What problem does this paper attempt to address?