Abstract:We propose a swin transformer based encoder‐decoder network, by incorporating a dual attentional skip connection with Swin‐UNet (DASUNet) for cloud segmentation. DASUNet characterizes the less salient features by equipping with token dual attention modules among the skip connection. The token dual attention module consists of token similarity attention and token importance attention, which compensates the ignorance of less salient features incurred from traditional attention mechanism during the stacking of transformer layers. Developing real‐time cloud segmentation technology is urgent for many remote sensing based applications such as weather forecasting. Existing deep learning based cloud segmentation methods involve two shortcomings. (a): They tend to produce discontinuous boundaries and fail to capture less salient feature, which corresponds to thin cloud pixels; (b): they are unrobust towards different scenarios. Those issues are circumvented by integrating U‐Net and the swin transformer together, with an efficiently designed dual attention mechanism based skip connection. Typically, a swin transformer based encoder‐decoder network, by incorporating a dual attentional skip connection with Swin‐UNet (DASUNet) is proposed. DASUNet captures the global relationship of image patches based on its window attention mechanism, which fits the real‐time requirement. Moreover, DASUNet characterizes the less salient features by equipping with token dual attention modules among the skip connection, which compensates the ignorance of less salient features incurred from traditional attention mechanism during the stacking of transformer layers. Experiments on ground‐based images (SWINySeg) and remote sensing images (HRC‐WHU, 38‐Cloud) show that, DASUNet achieves the state‐of‐the‐art or competitive results for cloud segmentation (six top‐1 positions of six metrics among 11 methods on SWINySeg, two top‐1 positions of five metrics among 10 methods on HRC‐WHU, two top‐1 positions of four metrics among 12 methods with ParaNum ≤30M on 38‐Cloud), with 100FPS implementation speed averagely for each 224×224 image.

Csswin-unet: a Swin-unet network for semantic segmentation of remote sensing images by aggregating contextual information and extracting spatial information

Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation.

P-Swin: Parallel Swin transformer multi-scale semantic segmentation network for land cover classification

A Semantic Segmentation Method for Remote Sensing Images Based on the Swin Transformer Fusion Gabor Filter

A dual attentional skip connection based Swin‐UNet for real‐time cloud segmentation

SSNet: A Novel Transformer and CNN Hybrid Network for Remote Sensing Semantic Segmentation

ER-Swin: Feature Enhancement and Refinement Network Based on Swin Transformer for Semantic Segmentation of Remote Sensing Images

Semantic Segmentation of Remote Sensing Images With Transformer-Based U-Net and Guided Focal-Axial Attention

Local-enhanced multi-scale aggregation swin transformer for semantic segmentation of high-resolution remote sensing images

CSC-Unet: A Novel Convolutional Sparse Coding Strategy Based Neural Network for Semantic Segmentation

Swin-CFNet: An Attempt at Fine-Grained Urban Green Space Classification Using Swin Transformer and Convolutional Neural Network

TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images

SSMM-DS: A semantic segmentation model for mangroves based on Deeplabv3+ with swin transformer

Enhancing Efficient Global Understanding Network with CSWin Transformer for Urban Scene Images Segmentation

UNet-like network fused swin transformer and CNN for semantic image synthesis

DESENet: a bilateral network with detail-enhanced semantic encoder for real-time semantic segmentation

A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images

A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images

Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery