Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

Zaiyan Zhang,Jining Yan,Yuanqi Liang,Jiaxin Feng,Haixu He,Wei Han
2024-06-19
Abstract:Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep learning-based approach called MS2TAN (Multi-scale Masked Spatial-Temporal Attention Network), for reconstructing time-series remote sensing images. Firstly, we introduce an efficient spatio-temporal feature extractor based on Masked Spatial-Temporal Attention (MSTA), to obtain high-quality representations of the spatio-temporal neighborhood features in the missing regions. Secondly, a Multi-scale Restoration Network consisting of the MSTA-based Feature Extractors, is employed to progressively refine the missing values by exploring spatio-temporal neighborhood features at different scales. Thirdly, we propose a ``Pixel-Structure-Perception'' Multi-Objective Joint Optimization method to enhance the visual effects of the reconstruction results from multiple perspectives and preserve more texture structures. Furthermore, the proposed method reconstructs missing values in all input temporal phases in parallel (i.e., Multi-In Multi-Out), achieving higher processing efficiency. Finally, experimental evaluations on two typical missing data restoration tasks across multiple research areas demonstrate that the proposed method outperforms state-of-the-art methods with an improvement of 0.40dB/1.17dB in mean peak signal-to-noise ratio (mPSNR) and 3.77/9.41 thousandths in mean structural similarity (mSSIM), while exhibiting stronger texture and structural consistency.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: Due to factors such as thick cloud cover and sensor limitations, remote - sensing images often suffer from significant data - missing problems, resulting in incomplete time - series information. The existing remote - sensing image missing - value imputation methods fail to fully utilize spatio - temporal auxiliary information, thus limiting the accuracy of restoration. Specifically, the paper aims to solve the following problems: 1. **Data - missing problem**: There is a large amount of missing data in remote - sensing images, especially in the case of high - resolution and long - time - series, which seriously affects the usability of the images and the accuracy of the analysis results. 2. **Limitations of existing methods**: Traditional spatio - temporal restoration methods rely on linear models, are difficult to handle complex scenes, and the generated images are often blurry and lack continuous textures. 3. **Efficient utilization of spatio - temporal information**: How to efficiently mine spatio - temporal information in high - resolution and long - time - series remote - sensing data to improve the restoration accuracy. To solve these problems, the author proposes a new deep - learning - based method - MS2TAN (Multi - scale Masked Spatial - Temporal Attention Network) for reconstructing missing data in time - series remote - sensing images. The main innovations of this method include: - **Masked Spatial - Temporal Attention (MSTA)**: By introducing missing - value masks and diagonal masks, the expressiveness of the spatio - temporal attention mechanism is enhanced, spectral differences are optimized, and color - transition artifacts at the boundaries of missing - value regions are reduced. - **Multi - scale restoration network**: Through feature extractors at different scales, the reconstruction of missing information is gradually refined, and the restoration accuracy is improved from coarse to fine. - **"Pixel - Structure - Perception" multi - objective joint - optimization method**: By combining pixel - level loss, structural loss, and perceptual loss, the reconstruction results of the model are optimized from multiple perspectives, improving the visual effect and retaining more texture and structural details. These innovations make MS2TAN perform excellently in simulation and real - data experiments in multiple research fields, and it has achieved significant improvements in indicators such as the root - mean - square error (mPSNR) and the structural similarity (mSSIM) compared with existing methods.