Predicting flood stages in watersheds with different scales using hourly rainfall dataset: A high-volume rainfall features empowered machine learning approach

Lei Qiao,Daniel Livsey,Jarrett Wise,Kem Kadavy,Sherry Hunt,Kevin Wagner
DOI: https://doi.org/10.1016/j.scitotenv.2024.175231
2024-11-10
Abstract:Accurate prediction of instantaneous high lake water levels and flood flows (flood stages) from micro-catchments to big river basins are critical for flood forecasting. Lake Carl Blackwell, a small-watershed reservoir in the south-central USA, served as a primary case study due to its rich historical dataset. Bearing knowledge that both current and previous rainfall contributes to the reservoirs' water body, a series of hourly rainfall features were created to maximize predicting power, which include total rainfall amounts in the current hour, the past 2 h, 3 h, …, 600 h in addition to previous-day lake levels. Notedly, the rainfall features are the accumulated rainfall amounts from present to previous hours rather than the rainfall amount in any specific hour. Random Forest Regression (RFR) was used to score the features' importance and predict the flood stages along with Neural Network - Multi-layer Perceptron Regression (NN-MLP), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), and the ordinary multi-variant linear regression (MLR) together with dimension reduced linear models of Principal Component Regression (PCR) and Partial Least Square Regression (PLSR). The prediction accuracy for the lake flood stages can be as high as 0.95 in R2, 0.11 ft. in mean absolute error (MAE), and 0.21 ft. in root mean square error (RMSE) for the testing dataset by the RFR (NN-MLP performed equally well), with small accuracy decreases by the other two non-linear algorithms of XGBoost and SVR. The linear regressions with dimension reductions had the lowest accuracy. Furthermore, our approach demonstrated high accuracy and broad applicability for surface runoff and streamflow predictions across three different-sized watersheds from micro-catchment to big river basins in the region, with increases of predicting power from earlier rainfall for larger watersheds and vice versa.
What problem does this paper attempt to address?