Csswin-unet: a Swin-unet network for semantic segmentation of remote sensing images by aggregating contextual information and extracting spatial information

Dong Xiao,Zhihao Kang,Yanhua Fu,Zhenni Li,Mengying Ran
DOI: https://doi.org/10.1080/01431161.2023.2285738
IF: 3.531
2023-01-01
International Journal of Remote Sensing
Abstract:Image interpretation algorithms based on deep learning are becoming increasingly important in land cover information acquisition. We propose CSSwin-unet, a network designed for the semantic segmentation of remote sensing images. CSSwin-unet is based on Swin-unet, follows the U-shaped codec structure of U-Net, but utilizes Swin transformer blocks with superior global modelling capabilities to constitute the codec. In addition, we design a parallel branch in the encoder with a context aggregation module (CAM) to enhance contextual information extraction and alleviate the semantic ambiguity problem resulting from occlusion. To address the problem of semantic information mismatch between codecs and improve the model's ability to extract spatial information, we constructed a space extraction module (SEM) in the skip connections, which replaces the direct copying of encoder features in Swin-unet. To reduce information loss during the downsampling process and strengthen the segmentation capacity of the network, we designed a feature shrinkage module (FSM) in the downsampling session. We conducted comprehensive ablation experiments on a dataset we produced ourselves and compared the results with other advanced methods. The test results showed significant improvement, with mIoU, mF1, and OA values improving by 2.83%, 2.47%, and 2.05%, respectively, compared to the second best performing model, Swin-unet. The above results prove the excellent performance of CSSwin-unet.
What problem does this paper attempt to address?