MST-UNet: a modified Swin Transformer for water bodies' mapping using Sentinel-2 images

Jiakai Li,Guogang Li,Tong Xie,Zebin Wu
DOI: https://doi.org/10.1117/1.jrs.17.026507
IF: 1.568
2023-01-01
Journal of Applied Remote Sensing
Abstract:Deep learning is widely used in remote sensing field of feature recognition. Symmetric encoder-decoder network, such as UNet, is one of the most commonly used image segmentation networks, but the accuracy is often low due to its simple structure. We combine two neural network models of convolutional neural network (CNN) and Swin Transformer called modified Swin Transformer using UNet structure (MST-UNet) to achieve accurate segmentation of water bodies from remote sensing data, with Xiamen City as study area. MST-UNet is based on symmetric encoder-decoder network. We use CNN and Swin Transformer blocks to extract features from input images and capture the interdependence among different pixels, respectively. More attention is paid to global information of images. By four times upsampling to obtain predictions, the results show that the accuracy of MST-UNet is better than UNet and its improved models. The Intersection of Union (IoU), mean IoU, and Dice score on test set reach 87.80%, 92.93%, 93.08%, respectively, which verifies the feasibility of the MST-UNet. This experiment has a reference value for related studies.
What problem does this paper attempt to address?