Improving Satellite Imagery Masking using Multi-task and Transfer Learning

Rangel Daroya,Luisa Vieira Lucchese,Travis Simmons,Punwath Prum,Tamlin Pavelsky,John Gardner,Colin J. Gleason,Subhransu Maji

2024-12-12

Abstract:Many remote sensing applications employ masking of pixels in satellite imagery for subsequent measurements. For example, estimating water quality variables, such as Suspended Sediment Concentration (SSC) requires isolating pixels depicting water bodies unaffected by clouds, their shadows, terrain shadows, and snow and ice formation. A significant bottleneck is the reliance on a variety of data products (e.g., satellite imagery, elevation maps), and a lack of precision in individual steps affecting estimation accuracy. We propose to improve both the accuracy and computational efficiency of masking by developing a system that predicts all required masks from Harmonized Landsat and Sentinel (HLS) imagery. Our model employs multi-tasking to share computation and enable higher accuracy across tasks. We experiment with recent advances in deep network architectures and show that masking models can benefit from these, especially when combined with pre-training on large satellite imagery datasets. We present a collection of models offering different speed/accuracy trade-offs for masking. MobileNet variants are the fastest, and perform competitively with larger architectures. Transformer-based architectures are the slowest, but benefit the most from pre-training on large satellite imagery datasets. Our models provide a 9% F1 score improvement compared to previous work on water pixel identification. When integrated with an SSC estimation system, our models result in a 30x speedup while reducing estimation error by 2.64 mg/L, allowing for global-scale analysis. We also evaluate our model on a recently proposed cloud and cloud shadow estimation benchmark, where we outperform the current state-of-the-art model by at least 6% in F1 score.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy and computational efficiency of pixel masking in satellite images, especially when estimating water quality variables such as suspended sediment concentration (SSC). Specifically, the authors point out that in many remote sensing applications, masking water pixels unaffected by clouds, shadows, topographic shadows, snow, and ice is crucial for subsequent measurements. However, existing methods rely on multiple data products (such as satellite images, elevation maps), and the lack of accuracy in each step affects the estimation accuracy. To solve these problems, the authors propose a method based on multi - task learning and transfer learning by developing a system to predict all required masks from Harmonized Landsat and Sentinel (HLS) images. This method aims to: 1. **Improve the accuracy of masking**: By sharing calculations and enabling higher - precision task processing. 2. **Improve computational efficiency**: Reduce the time and resources required for training and inference. 3. **Take advantage of the advantages of deep - learning architectures**: Experiments show that, combined with pre - training on large - scale satellite image datasets, the masking model can benefit from recent advances in deep - network architectures. The authors also show that their model has a 9% improvement in the F1 score for water - pixel identification compared to previous work, and after being integrated into the SSC estimation system, it achieves a 30 - fold speed increase and reduces the estimation error by 2.64 mg/L, thus allowing for global - scale analysis. In addition, they also outperform the current state - of - the - art models on the recently proposed cloud and cloud - shadow estimation benchmark, with at least a 6% improvement in the F1 score. ### Formula summary - **Binary cross - entropy loss function**: \[ L_{\text{bce}}(\hat{y}_m^i, y_m^i)=-\frac{1}{WH}\sum_{j = 1}^{W}\sum_{k = 1}^{H}\left[y_m^i(j,k)\log\hat{y}_m^i(j,k)+(1 - y_m^i(j,k))\log(1 - \hat{y}_m^i(j,k))\right] \] - **Formula for the multi - task model**: \[ z_i = f_\theta(x_i) \] \[ y_m^i = g_m^{\phi_m}(z_i) \] \[ L=\frac{1}{N}\sum_{i = 1}^{N}\sum_{m}L_{\text{bce}}(\hat{y}_m^i, y_m^i) \] These formulas are used to describe how the model extracts features \(z_i\) from the input image \(x_i\) and generates multiple masking outputs \(y_m^i\) through different "heads" \(g_m^{\phi_m}\), and finally minimizes the loss function \(L\) by optimizing the parameters \(\theta\) and \(\phi_m\).

Improving Satellite Imagery Masking using Multi-task and Transfer Learning

Convolutional Neural Network-Driven Improvements in Global Cloud Detection for Landsat 8 and Transfer Learning on Sentinel-2 Imagery

CloudS2Mask: A novel deep learning approach for improved cloud and cloud shadow masking in Sentinel-2 imagery

Masking Hyperspectral Imaging Data with Pretrained Models

SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

DeepMask: an algorithm for cloud and cloud shadow detection in optical satellite remote sensing images using deep residual network

Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset

Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data

Effective Detection of Cloud Masks in Remote Sensing Images

OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

Masked Feature Modeling for Generative Self-Supervised Representation Learning of High-Resolution Remote Sensing Images

Improved mask R-CNN-based cloud masking method for remote sensing images

Deep learning-based harmonization and super-resolution of Landsat-8 and Sentinel-2 images

Mask Conditional Synthetic Satellite Imagery

Multi-Temporal Pixel-Based Compositing for Cloud Removal Based on Cloud Masks Developed Using Classification Techniques

Improving future optical Earth Observation products using transfer learning

Deep learning based F-Mask alternative for Sentinel-2 images in polar regions

Mapping floods from remote sensing data and quantifying the effects of surface obstruction by clouds and vegetation

Thick Clouds Removing From Multitemporal Landsat Images Using Spatiotemporal Neural Networks

An Innovative Approach for Effective Removal of Thin Clouds in Optical Images Using Convolutional Matting Model

Reconstructing Sea Surface Temperature Images: A Masked Autoencoder Approach for Cloud Masking and Reconstruction