Developing a Multi-Scale Convolutional Neural Network for Spatiotemporal Fusion to Generate MODIS-like Data Using AVHRR and Landsat Images

Zhicheng Zhang,Zurui Ao,Wei Wu,Yidan Wang,Qinchuan Xin
DOI: https://doi.org/10.3390/rs16061086
IF: 5
2024-03-21
Remote Sensing
Abstract:Remote sensing data are becoming increasingly important for quantifying long-term changes in land surfaces. Optical sensors onboard satellite platforms face a tradeoff between temporal and spatial resolutions. Spatiotemporal fusion models can produce high spatiotemporal data, while existing models are not designed to produce moderate-spatial-resolution data, like Moderate-Resolution Imaging Spectroradiometer (MODIS), which has moderate spatial detail and frequent temporal coverage. This limitation arises from the challenge of combining coarse- and fine-spatial-resolution data, due to their large spatial resolution gap. This study presents a novel model, named multi-scale convolutional neural network for spatiotemporal fusion (MSCSTF), to generate MODIS-like data by addressing the large spatial-scale gap in blending the Advanced Very-High-Resolution Radiometer (AVHRR) and Landsat images. To mitigate the considerable biases between AVHRR and Landsat with MODIS images, an image correction module is included into the model using deep supervision. The outcomes show that the modeled MODIS-like images are consistent with the observed ones in five tested areas, as evidenced by the root mean square errors (RMSE) of 0.030, 0.022, 0.075, 0.036, and 0.045, respectively. The model makes reasonable predictions on reconstructing retrospective MODIS-like data when evaluating against Landsat data. The proposed MSCSTF model outperforms six other comparative models in accuracy, with regional average RMSE values being lower by 0.005, 0.007, 0.073, 0.062, 0.070, and 0.060, respectively, compared to the counterparts in the other models. The developed method does not rely on MODIS images as input, and it has the potential to reconstruct MODIS-like data prior to 2000 for retrospective studies and applications.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively fuse remote sensing images from different sensors, especially Advanced Very High Resolution Radiometer (AVHRR) and Landsat images, when generating data similar to Moderate - resolution Imaging Spectroradiometer (MODIS). Specifically, the paper aims to address the following key challenges: 1. **Spatial scale differences**: There is a significant spatial resolution gap between AVHRR and Landsat images, approximately 192 times. This huge spatial scale difference makes existing spatio - temporal fusion models difficult to directly handle these two types of data, because these models are usually designed to fuse data sets with smaller spatial resolution differences, such as MODIS and Landsat images (about 16 - fold difference). 2. **Systematic biases**: Due to factors such as spectral response functions of different sensors, viewing zenith angles, georegistration errors, and inconsistent acquisition times, there are systematic biases among AVHRR, Landsat, and MODIS images. These biases may introduce uncertainties and affect the accuracy of fusion results. 3. **Historical data reconstruction**: The paper also aims to use AVHRR and Landsat images to reconstruct MODIS - like data before 2000 to support retrospective studies and applications. Since MODIS data were not available before 2000, generating similar high - quality data is of great value for scientific research. To address these challenges, the paper proposes a new multi - scale convolutional neural network spatio - temporal fusion model (MSCSTF), which solves the above problems through the following methods: - **Multi - scale feature extraction module**: Through up - sampling and down - sampling methods, a non - linear spatial pyramid mapping is constructed to effectively handle the huge spatial scale difference between AVHRR and Landsat images. - **Image correction module**: Before spatio - temporal fusion, correct the images obtained by different sensors to reduce systematic biases and improve the accuracy of fusion results. - **Spatio - temporal fusion module**: Use deep - learning techniques to fuse the extracted multi - scale features to generate MODIS - like data. Through these methods, the paper aims to provide an efficient and accurate solution to generate high - quality MODIS - like data, especially in terms of historical data reconstruction before 2000.