Comparison of Multiple Machine Learning Methods for Correcting Groundwater Levels Predicted by Physics-Based Models

Guanyin Shuai,Yan Zhou,Jingli Shao,Yali Cui,Qiulan Zhang,Chaowei Jin,Shuyuan Xu
DOI: https://doi.org/10.3390/su16020653
IF: 3.9
2024-01-12
Sustainability
Abstract:Accurate groundwater level (GWL) prediction is crucial in groundwater resource management. Currently, it relies mainly on physics-based models for prediction and quantitative analysis. However, physics-based models used for prediction often have errors in structure, parameters, and data, resulting in inaccurate GWL predictions. In this study, machine learning algorithms were used to correct the prediction errors of physics-based models. First, a MODFLOW groundwater flow model was created for the Hutuo River alluvial fan in the North China Plain. Then, using the observed GWLs from 10 monitoring wells located in the upper, middle, and lower parts of the alluvial fan as the test standard, three algorithms—random forest (RF), extreme gradient boosting (XGBoost), and long short-term memory (LSTM)—were compared for their abilities to correct MODFLOW's predicted GWLs of these 10 wells under two sets of feature variables. The results show that the RF and XGBoost algorithms are not suitable for correcting predicted GWLs that exhibit continuous rising or falling trends, but the LSTM algorithm has the ability to correct them. During the prediction period, the LSTM2 model, which incorporates additional source–sink feature variables based on MODFLOW's predicted GWLs, can improve the Pearson correlation coefficient (PR) for 80% of wells, with a maximum increase of 1.26 and a minimum increase of 0.02, and can reduce the root mean square error (RMSE) for 100% of the wells with a maximum decrease of 1.59 m and a minimum decrease of 0.17 m. And it also outperforms the MODFLOW model in capturing the long-term trends and short-term seasonal fluctuations of GWLs. However, the correction effect of the LSTM1 model (using only MODFLOW's predicted GWLs as a feature variable) is inferior to that of the LSTM2 model, indicating that multiple feature variables are superior to a single feature variable. Temporally and spatially, the greater the prediction error of the MODFLOW model, the larger the correction magnitude of the LSTM2 model.
environmental sciences,environmental studies,green & sustainable science & technology
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the accuracy issues in groundwater level (GWL) prediction. Currently, GWL predictions mainly rely on physics-based models, but these models have inherent errors in structure, parameters, and data, leading to inaccurate predictions. Therefore, researchers use machine learning algorithms to correct the prediction errors of physics-based models. Specifically, the researchers created a MODFLOW groundwater flow model suitable for the Hutuo River alluvial fan area in the North China Plain and used the model's predictions as a benchmark. Then, by comparing three algorithms: Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory Network (LSTM), they evaluated their ability to correct the MODFLOW predicted groundwater levels. The study found that for GWL predictions showing continuous rising or falling trends, RF and XGBoost algorithms were less suitable, while the LSTM algorithm could effectively correct these prediction errors. Additionally, the study showed that during the prediction period, the LSTM2 model, which includes additional source-sink feature variables, could improve the Pearson correlation coefficient (PR) of 80% of the monitoring wells, with a maximum increase of 1.26 and a minimum increase of 0.02. It could also reduce the root mean square error (RMSE) of all monitoring wells, with a maximum reduction of 1.59 meters and a minimum reduction of 0.17 meters. This indicates that multi-feature variables are superior to single-feature variables. Overall, the LSTM2 model performs better in correcting the prediction errors of the MODFLOW model.