A diversity centric strategy for the selection of spatio-temporal training data for LSTM-based streamflow forecasting

DOI: https://doi.org/10.5194/hess-2024-169
IF: 6.3
2024-06-14
Hydrology and Earth System Sciences
Abstract:Deep learning models are increasingly being applied to streamflow forecasting problems. Their success is in part attributed to the large and hydrologically diverse datasets on which they are trained. However, common data selection methods fail to explicitly account for hydrological diversity contained within training data. In this research, clustering is used to characterise temporal and spatial diversity, in order to better understand the importance of hydrological diversity within regional training datasets. This study presents a novel, diversity-based resampling approach to creating hydrologically diverse datasets. First, the undersampling procedure is used to undersample temporal data, and is used to show how the amount of temporal data needed to train models can be halved without any loss in performance. Next, it is applied to reduce the number of basins in the training dataset. While basins cannot be omitted from training without some loss in performance, we show how hydrologically dissimilar basins are highly beneficial to model performance. This is shown empirically for Canadian basins; models trained to sets of basins separated by thousands of kilometres outperform models trained to localised clusters. We strongly recommend an approach to training data selection that encourages a broad representation of diverse hydrological processes.
geosciences, multidisciplinary,water resources
What problem does this paper attempt to address?