Uncertainty assessment of optically active and inactive water quality parameters predictions using satellite data, deep and ensemble learnings

Bahareh Raheli,Nasser Talabbeydokhti,Vahid Nourani
DOI: https://doi.org/10.1016/j.jhydrol.2024.132091
IF: 6.4
2024-10-04
Journal of Hydrology
Abstract:The inherent uncertainty in stochastic processes, such as lake water quality modeling, makes point predictions of water quality parameters (WQPs) insufficient for effective decision-making and management when using data-driven techniques like Artificial Intelligence (AI). This study aimed to address this issue by assessing the uncertainty associated with AI-based WQP predictions through the analysis of prediction intervals (PIs). Herein, the bootstrap method alongside AI models was utilized to estimate PIs for both optically active (Electrical Conductivity, EC) and inactive (pH, Total Dissolved Solids, TDS, Sodium Adsorption Ratio, SAR, and Total Hardness, TH) parameters at selected monitoring stations within Lake Urmia, located in the northwest region of Iran. shallow AI models—including Support Vector Regression (SVR), Random Forest (RF), and Feedforward Neural Networks (FFNN)—and deep learning (DL) methods, such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNN), were employed in this study. An innovative ensemble post-processing technique was specifically applied to the shallow AI models to enhance their performance. WQPs were treated as dependent variables, with independent Landsat remote sensing (RS) reflectance band data serving as input, while lake water level data was included as a secondary input to enhance model accuracy and minimize errors. Results demonstrated the superiority of the ensemble technique over individual AI and DL models. Findings indicated that DL models are less appropriate for limited observed data compared to the ensemble technique. Moreover, in terms of the model's performance for PIs, the bootstrap method provided acceptable results in estimating coverage probability and coverage width criterion (CWC). The ensemble technique consistently outperformed DL models, achieving an average of 18.53% and 27.54% lower CWC compared to LSTM and TCN models. However, the uncertainty quantity, particularly for DL models, was affected by the amount of data used. Therefore, a generative neural network model of autoencoders, known as TimeGAN (Time Generative Adversarial Network), was employed to generate synthetic data, aiming to reduce input data uncertainty. The results indicated a significant reduction in average CWC values—30.99%, 26.36%, and 23.76% for LSTM, TCN, and the ensemble technique, respectively—with deeper reductions observed for the DL models.
geosciences, multidisciplinary,water resources,engineering, civil
What problem does this paper attempt to address?