CRformer: Multi-modal data fusion to reconstruct cloud-free optical imagery

Yu Xia,Wei He,Qi Huang,Guoying Yin,Wenbin Liu,Hongyan Zhang
DOI: https://doi.org/10.1016/j.jag.2024.103793
IF: 7.5
2024-03-29
International Journal of Applied Earth Observation and Geoinformation
Abstract:Cloud contamination is a common problem in Earth observation that hinders various remote sensing applications. To address this problem, recent studies have employed deep neural networks and multi-modal data fusion to reconstruct cloud-free optical imagery. However, this task faces many challenges, such as: (1) the scarcity of suitable multi-modal datasets; (2) the ineffective use of feature correlations; and (3) the limited applicability of existing models. To overcome these challenges, this study proposes a novel solution that fuses high-spatial SAR and low-spatial optical data to reconstruct high-quality cloud-free multi-spectral optical products. First, a curated benchmark dataset, named SMILE-CR, is created with a realistic cloud simulation strategy. The SMILE-CR serves as a global and multi-modal cloud removal dataset for the Landsat-8 sensor, with Sentinel-1 and MODIS data as additional supplementary data. Second, a Transformer-based cloud removal network, abbreviated as CRformer, is developed with two novel modules: multi-head dense and sparse attention and multi-scale gated-dconv feed-forward network. The CRformer achieves global attention while suppressing the weak correlations and enhancing the multi-scale cloud features by filtering out invalid features. The performance of the proposed method is evaluated through extensive experiments. The results show that the CRformer surpasses the state-of-the-art cloud removal methods with significant improvements in both quantitative and qualitative metrics. The fusion of MODIS and Sentinel-1 data is shown to be effective and necessary in reconstructing Landsat-8 observations. Moreover, the CRformer model can be readily applied to reconstruct time-series cloud-free Landsat-8 products in Wuhan city, which can improve the average accuracy of land cover mapping by over 3%.
remote sensing
What problem does this paper attempt to address?