Abstract:Remote sensing data are becoming increasingly important for quantifying long-term changes in land surfaces. Optical sensors onboard satellite platforms face a tradeoff between temporal and spatial resolutions. Spatiotemporal fusion models can produce high spatiotemporal data, while existing models are not designed to produce moderate-spatial-resolution data, like Moderate-Resolution Imaging Spectroradiometer (MODIS), which has moderate spatial detail and frequent temporal coverage. This limitation arises from the challenge of combining coarse- and fine-spatial-resolution data, due to their large spatial resolution gap. This study presents a novel model, named multi-scale convolutional neural network for spatiotemporal fusion (MSCSTF), to generate MODIS-like data by addressing the large spatial-scale gap in blending the Advanced Very-High-Resolution Radiometer (AVHRR) and Landsat images. To mitigate the considerable biases between AVHRR and Landsat with MODIS images, an image correction module is included into the model using deep supervision. The outcomes show that the modeled MODIS-like images are consistent with the observed ones in five tested areas, as evidenced by the root mean square errors (RMSE) of 0.030, 0.022, 0.075, 0.036, and 0.045, respectively. The model makes reasonable predictions on reconstructing retrospective MODIS-like data when evaluating against Landsat data. The proposed MSCSTF model outperforms six other comparative models in accuracy, with regional average RMSE values being lower by 0.005, 0.007, 0.073, 0.062, 0.070, and 0.060, respectively, compared to the counterparts in the other models. The developed method does not rely on MODIS images as input, and it has the potential to reconstruct MODIS-like data prior to 2000 for retrospective studies and applications.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively fuse remote sensing images from different sensors, especially Advanced Very High Resolution Radiometer (AVHRR) and Landsat images, when generating data similar to Moderate - resolution Imaging Spectroradiometer (MODIS). Specifically, the paper aims to address the following key challenges: 1. **Spatial scale differences**: There is a significant spatial resolution gap between AVHRR and Landsat images, approximately 192 times. This huge spatial scale difference makes existing spatio - temporal fusion models difficult to directly handle these two types of data, because these models are usually designed to fuse data sets with smaller spatial resolution differences, such as MODIS and Landsat images (about 16 - fold difference). 2. **Systematic biases**: Due to factors such as spectral response functions of different sensors, viewing zenith angles, georegistration errors, and inconsistent acquisition times, there are systematic biases among AVHRR, Landsat, and MODIS images. These biases may introduce uncertainties and affect the accuracy of fusion results. 3. **Historical data reconstruction**: The paper also aims to use AVHRR and Landsat images to reconstruct MODIS - like data before 2000 to support retrospective studies and applications. Since MODIS data were not available before 2000, generating similar high - quality data is of great value for scientific research. To address these challenges, the paper proposes a new multi - scale convolutional neural network spatio - temporal fusion model (MSCSTF), which solves the above problems through the following methods: - **Multi - scale feature extraction module**: Through up - sampling and down - sampling methods, a non - linear spatial pyramid mapping is constructed to effectively handle the huge spatial scale difference between AVHRR and Landsat images. - **Image correction module**: Before spatio - temporal fusion, correct the images obtained by different sensors to reduce systematic biases and improve the accuracy of fusion results. - **Spatio - temporal fusion module**: Use deep - learning techniques to fuse the extracted multi - scale features to generate MODIS - like data. Through these methods, the paper aims to provide an efficient and accurate solution to generate high - quality MODIS - like data, especially in terms of historical data reconstruction before 2000.

Developing a Multi-Scale Convolutional Neural Network for Spatiotemporal Fusion to Generate MODIS-like Data Using AVHRR and Landsat Images

Deep Learning-Based Spatiotemporal Fusion Architecture of Landsat 8 and Sentinel-2 Data for 10 m Series Imagery

A Deep Learning-Based Spatio-Temporal NDVI Data Fusion Model

MSFusion: Multistage for Remote Sensing Image Spatiotemporal Fusion Based on Texture Transformer and Convolutional Neural Network

Enhanced Spatiotemporal Fusion via MODIS-Like Images

An Effective Multi-model Fusion Method for SAR and Optical Remote Sensing Images

A Robust Hybrid Deep Learning Model for Spatiotemporal Image Fusion

Fusion of optical and SAR images based on deep learning to reconstruct vegetation NDVI time series in cloud-prone regions

Fusing Landsat-7, Landsat-8 and Sentinel-2 Surface Reflectance to Generate Dense Time Series Images with 10m Spatial Resolution

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

A Data Fusion Modeling Framework for Retrieval of Land Surface Temperature from Landsat-8 and MODIS Data

An Efficient Cross-Modality Self-Calibrated Network for Hyperspectral and Multispectral Image Fusion

Deep-Learning-Based Spatio-Temporal-Spectral Integrated Fusion of Heterogeneous Remote Sensing Images

A Robust Method for Generating High-Spatiotemporal-Resolution Surface Reflectance by Fusing MODIS and Landsat Data

A modified flexible spatiotemporal data fusion model

RES-STF: Spatio-temporal fusion of VIIRS and Landsat land surface temperature based on Restormer

MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification

Deep Learning-Based Spatiotemporal Fusion Approach for Producing High-Resolution NDVI Time-Series Datasets

A Pseudo-Siamese Deep Convolutional Neural Network for Spatiotemporal Satellite Image Fusion

Deep Learning-Based Spatiotemporal Data Fusion Using a Patch-to-Pixel Mapping Strategy and Model Comparisons