High-performance prediction model combining minimum redundancy maximum relevance, circulant spectrum analysis, and machine learning methods for daily and peak streamflow

Levent Latifoğlu,Esra Kaya
DOI: https://doi.org/10.1007/s00704-023-04653-4
2023-09-20
Theorectical and Applied Climatology
Abstract:Streamflow predictions play a crucial role in the planning and management of water structures. However, accurately predicting streamflow data, which exhibits nonlinear and nonstationary characteristics, is a challenging problem. In this study, a novel approach was proposed for the prediction of both overall and peak streamflows, aiming to achieve high performance. The data used included precipitation and streamflow time series, as well as lagged data from the empirical mode decomposition (EMD), variational mode decomposition (VMD), and circulant spectrum analysis (ciSSA) subbands. The minimum redundancy maximum relevance (MRMR) method was employed for feature selection from these datasets. The selected features were used to develop daily streamflow prediction models using Gaussian process regression (GPR), ensemble (boosting and bagging), support vector regression (SVR), and artificial neural network (ANN) methods. The performance of the developed MRMR-, EMD-MRMR-, VMD-MRMR-, and ciSSA-MRMR-machine learning models was evaluated using mean squared error (MSE), mean absolute error (MAE), correlation coefficient ( R ), and determination coefficient ( R 2 ) metrics. Additionally, the Bland-Altman plots and the Kruskal–Wallis test were used to determine the statistical significance of the results. According to the results, the ciSSA-MRMR-machine learning models achieved higher performance compared to the other models ( R 2 value of 0.956, an MSE of 0.0001, and an MAE of 0.0049 for overall streamflow prediction). For peak streamflow prediction, the ciSSA-MRMR-ANN model yielded an R 2 value of 0.956, an MSE of 0.0002, and an MAE of 0.0217. It was observed that the proposed method significantly improved the prediction of not only overall streamflow but also peak streamflow values.
What problem does this paper attempt to address?