C-LiSA: A Hierarchical Hybrid Transformer Model Using Orthogonal Cross Attention for Satellite Image Cloud Segmentation

Subhajit Paul,Ashutosh Gupta,S. Manthira Moorthi,Debajyoti Dhar
DOI: https://doi.org/10.1109/tgrs.2024.3394929
IF: 8.2
2024-05-22
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Clouds in optical satellite images are a major concern since their presence hinders the ability to carry out accurate analysis as well as processing. Therefore, deriving accurate pixel-wise cloud masks is a key task in optical remote sensing. Several traditional as well as deep-learning algorithms have emerged to address this problem. However, the task of deriving accurate cloud masks from a variety of satellite images remains elusive due to the presence of confusing spectral signatures and changes in the properties of imaging sensors. In this article, we introduce a deep-learning model based on a hybrid transformer architecture for effective cloud mask generation named C-LiSA - cloud segmentation via the Lipschitz stable attention network. We propose two key attention mechanisms - dual orthogonal self-attention (DOSA) for handling confusing spectral signatures, and the hierarchical cross-channel attention (HC2A) model for effectively highlighting cloud-specific features during the cloud segmentation process. To validate the effectiveness of these mechanisms, we carry out theoretical and empirical Lipschitz stability analysis. We design the whole setup under an adversarial setting in the presence of Lovász-Softmax loss. We demonstrate both qualitative and quantitative outcomes for multiple satellite image datasets including Landsat-8, Sentinel-2, and Cartosat-2S. Our comparative study shows that the proposed model performs better compared to other state-of-the-art methods while providing better generalization across different datasets. We also showcase ablation studies to endorse our choices corresponding to different architectural elements and objective functions.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?