CSDFormer: A cloud and shadow detection method for landsat images based on transformer

Jiayi Li,Qunming Wang
DOI: https://doi.org/10.1016/j.jag.2024.103799
IF: 7.5
2024-04-03
International Journal of Applied Earth Observation and Geoinformation
Abstract:Cloud and shadow (CS) detection is crucial prerequisite for application of remote sensing images. Current deep learning-based detection algorithms mainly employ Convolutional Neural Networks (CNNs). However, the local receptive field in CNNs cannot effectively capture global contextual information, which hinders accurate characterization of the dependency between clouds and shadows. In vision Transformers, self-attention mechanisms can effectively capture the long-distance dependencies between different regions in an image. Inspired by this, this paper proposed a new CS Detection algorithm based on a Transformer, called CSDFormer. Specifically, we exclusively employed a hierarchical Transformer structure in the encoder stage to extract features of CS. Each Transformer layer contains several multi-head self-attention mechanisms for calculating pixel-wise long-distance connectivity. The designed structure enables the Transformer to better extract global context information, which helps to strengthen the comprehension of the semantic relationships between clouds and shadows. Benefiting from the global feature extraction capability of the encoder stage, we employed several simple multilayer perceptron layers for multi-scale feature map fusion and pixel classification in the decoder stage. The proposed CSDFormer was validated using 898 Landsat 8 Biome images with 512 × 512 pixels, producing an overall accuracy of 95.28 % and a mean intersection over union of 84.08 %, outperforming three state-of-the-art CNN-based algorithms. CSDFormer is consistently more accurate in detection of both clouds and shadows. Owing to the parallel computing capability of the self-attention mechanism, CSDFormer is computationally more efficient than the three CNN-based benchmark methods. For the input spectral bands, the performance of CSDFormer produced can be further enhanced with additional thermal infrared bands.
remote sensing
What problem does this paper attempt to address?