Boundary-aware Spatial and Frequency Dual-domain Transformer for Remote Sensing Urban Images Segmentation

Jie Zhang,Mingwen Shao,Yecong Wan,Lingzhuang Meng,Xiangyong Cao,Shuigen Wang
DOI: https://doi.org/10.1109/tgrs.2024.3430081
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Semantic segmentation of remote sensing (RS) images refers to labeling each pixel with a class to identify objects or land cover types. Existing mainstream spatial-domain semantic segmentation methods are mainly categorized into convolutional neural network (CNN)-based and vision transformer (ViT)-based approaches. The former excels at capturing local features, while the latter is adept at extracting global features. Several recent approaches consider combining CNN and ViT to efficiently capture local and global features. However, these approaches still struggle to capture complete features of the RS images, resulting in inaccurate segmentation. To address this issue, we introduce the fast Fourier transform (FFT), which transforms images into the frequency domain for feature extraction, acquiring the image-size receptive field that can complement spatial-domain methods. Based on this, we propose a boundary-aware spatial and frequency dual-domain transformer, termed dual-domain transformer. Specifically, our dual-domain transformer incorporates a dual-domain mixer (DualM), where the spatial-domain branch combines depthwise convolution and the attention mechanism to extract local and global features effectively, while the frequency-domain branch uses FFT to extract image-size features. The two branches complement each other, enabling a more comprehensive feature extraction of RS images. Meanwhile, a boundary-guided training strategy utilizing a boundary-aware module (BAM) is devised to constrain the model extract and predict boundary detail texture, which is an auxiliary task. In addition, the decoder incorporates a scale-feature fusion module (SFM) for adaptive information fusion between the encoder and decoder. Comprehensive experiments on the Zeebrugge and ISPRS datasets, including Vaihingen and Potsdam, showcase that the dual-domain transformer significantly outperforms state-of-the-art (SOTA) methods.
What problem does this paper attempt to address?