Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge

Timothy Dai,Kate Maher,Zach Perzan
2024-07-30
Abstract:Process-based hydrologic models are invaluable tools for understanding the terrestrial water cycle and addressing modern water resources problems. However, many hydrologic models are computationally expensive and, depending on the resolution and scale, simulations can take on the order of hours to days to complete. While techniques such as uncertainty quantification and optimization have become valuable tools for supporting management decisions, these analyses typically require hundreds of model simulations, which are too computationally expensive to perform with a process-based hydrologic model. To address this gap, we propose a hybrid modeling workflow in which a process-based model is used to generate an initial set of simulations and a machine learning (ML) surrogate model is then trained to perform the remaining simulations required for downstream analysis. As a case study, we apply this workflow to simulations of variably saturated groundwater flow at a prospective managed aquifer recharge (MAR) site. We compare the accuracy and computational efficiency of several ML architectures, including deep convolutional networks, recurrent neural networks, vision transformers, and networks with Fourier transforms. Our results demonstrate that ML surrogate models can achieve under 10% mean absolute percentage error and yield order-of-magnitude runtime savings over processed-based models. We also offer practical recommendations for training hydrologic surrogate models, including implementing data normalization to improve accuracy, using a normalized loss function to improve training stability and downsampling input features to decrease memory requirements.
Geophysics,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the computational efficiency limitations of process-based hydrological modeling, particularly in applications under water resource management scenarios such as Managed Aquifer Recharge (MAR). Specifically, traditional hydrological models, while capable of accurately simulating the terrestrial water cycle and addressing modern water resource issues, incur extremely high computational costs when applied at high resolutions and large scales. A single simulation may take hours or even days. This is a significant computational burden for scenarios requiring hundreds of simulations to support uncertainty quantification, parameter estimation, and optimization decision analysis. To solve this problem, the researchers propose a hybrid modeling workflow that combines the high accuracy of process-based models with the computational efficiency of machine learning models. The specific method is as follows: 1. **Initial Stage**: First, use a process-based hydrological model (e.g., ParFlow-CLM) to generate an initial set of simulation datasets. 2. **Training Stage**: Then, use this dataset to train a machine learning surrogate model to perform the large number of simulations required subsequently. 3. **Application Stage**: Once training is complete, the surrogate model is used to quickly generate the remaining simulation results for further analysis, such as uncertainty quantification or resource optimization. As a case study, the authors applied this hybrid workflow to simulate changes in the saturated zone storage at a MAR site in Tulare County, California. By comparing different machine learning architectures (including deep convolutional networks, recurrent neural networks, vision transformers, and networks with Fourier transforms), they demonstrated that the surrogate model could achieve predictions with less than 10% mean absolute percentage error and save orders of magnitude in runtime compared to process-based models. Additionally, the study provides practical recommendations on how to train hydrological surrogate models, including data normalization to improve accuracy, using normalized loss functions to enhance training stability, and downsampling high-dimensional input features to reduce memory requirements. These findings are of significant guidance for practitioners looking to leverage machine learning to accelerate hydrological simulations.