Improving Satellite Imagery Masking using Multi-task and Transfer Learning

Rangel Daroya,Luisa Vieira Lucchese,Travis Simmons,Punwath Prum,Tamlin Pavelsky,John Gardner,Colin J. Gleason,Subhransu Maji
2024-12-12
Abstract:Many remote sensing applications employ masking of pixels in satellite imagery for subsequent measurements. For example, estimating water quality variables, such as Suspended Sediment Concentration (SSC) requires isolating pixels depicting water bodies unaffected by clouds, their shadows, terrain shadows, and snow and ice formation. A significant bottleneck is the reliance on a variety of data products (e.g., satellite imagery, elevation maps), and a lack of precision in individual steps affecting estimation accuracy. We propose to improve both the accuracy and computational efficiency of masking by developing a system that predicts all required masks from Harmonized Landsat and Sentinel (HLS) imagery. Our model employs multi-tasking to share computation and enable higher accuracy across tasks. We experiment with recent advances in deep network architectures and show that masking models can benefit from these, especially when combined with pre-training on large satellite imagery datasets. We present a collection of models offering different speed/accuracy trade-offs for masking. MobileNet variants are the fastest, and perform competitively with larger architectures. Transformer-based architectures are the slowest, but benefit the most from pre-training on large satellite imagery datasets. Our models provide a 9% F1 score improvement compared to previous work on water pixel identification. When integrated with an SSC estimation system, our models result in a 30x speedup while reducing estimation error by 2.64 mg/L, allowing for global-scale analysis. We also evaluate our model on a recently proposed cloud and cloud shadow estimation benchmark, where we outperform the current state-of-the-art model by at least 6% in F1 score.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and computational efficiency of pixel masking in satellite images, especially when estimating water quality variables such as suspended sediment concentration (SSC). Specifically, the authors point out that in many remote sensing applications, masking water pixels unaffected by clouds, shadows, topographic shadows, snow, and ice is crucial for subsequent measurements. However, existing methods rely on multiple data products (such as satellite images, elevation maps), and the lack of accuracy in each step affects the estimation accuracy. To solve these problems, the authors propose a method based on multi - task learning and transfer learning by developing a system to predict all required masks from Harmonized Landsat and Sentinel (HLS) images. This method aims to: 1. **Improve the accuracy of masking**: By sharing calculations and enabling higher - precision task processing. 2. **Improve computational efficiency**: Reduce the time and resources required for training and inference. 3. **Take advantage of the advantages of deep - learning architectures**: Experiments show that, combined with pre - training on large - scale satellite image datasets, the masking model can benefit from recent advances in deep - network architectures. The authors also show that their model has a 9% improvement in the F1 score for water - pixel identification compared to previous work, and after being integrated into the SSC estimation system, it achieves a 30 - fold speed increase and reduces the estimation error by 2.64 mg/L, thus allowing for global - scale analysis. In addition, they also outperform the current state - of - the - art models on the recently proposed cloud and cloud - shadow estimation benchmark, with at least a 6% improvement in the F1 score. ### Formula summary - **Binary cross - entropy loss function**: \[ L_{\text{bce}}(\hat{y}_m^i, y_m^i)=-\frac{1}{WH}\sum_{j = 1}^{W}\sum_{k = 1}^{H}\left[y_m^i(j,k)\log\hat{y}_m^i(j,k)+(1 - y_m^i(j,k))\log(1 - \hat{y}_m^i(j,k))\right] \] - **Formula for the multi - task model**: \[ z_i = f_\theta(x_i) \] \[ y_m^i = g_m^{\phi_m}(z_i) \] \[ L=\frac{1}{N}\sum_{i = 1}^{N}\sum_{m}L_{\text{bce}}(\hat{y}_m^i, y_m^i) \] These formulas are used to describe how the model extracts features \(z_i\) from the input image \(x_i\) and generates multiple masking outputs \(y_m^i\) through different "heads" \(g_m^{\phi_m}\), and finally minimizes the loss function \(L\) by optimizing the parameters \(\theta\) and \(\phi_m\).