Performance of ensemble‐learning models for predicting eutrophication in Zhuyi Bay, Three Gorges Reservoir

Mingming Hu,Yuchun Wang,Zhiyu Sun,Yuming Su,Shanze Li,Yufei Bao,Jie Wen
DOI: https://doi.org/10.1002/rra.3739
2020-10-12
River Research and Applications
Abstract:<p>Eutrophication and sporadic algal blooms occurring in the tributary bays of the Three Gorges Reservoir in Hubei, China, have become major environmental issues following impoundment. However, predicting eutrophication with traditional methods based on monthly monitoring data remains challenging. In order to explore the potential of data‐driven models in eutrophication prediction and establish reliable prediction data‐driven model based on monthly monitoring data. In this study, two ensemble‐learning models, random forests (RF) and gradient boosted decision trees (GBDT), were used to predict eutrophication in Zhuyi Bay. To address the target, three objectives were solved. First, RF and GBDT used to regress chlorophyll‐<i>a</i> concentrations showed good model fit across two monitoring data sets, with <i>R</i><sup>2</sup> values of 0.809 and 0.822 for RF and 0.824 and 0.828 for GBDT. Second, the relative variable importance plots computed by ensemble‐learning models was calculated for selecting monitoring parameters and identify drivers of eutrophication. To improve model fit, it was more important to monitor key parameters of eutrophication (such as water transparency) than to increase sample size. Third, K‐Means++ modelling was used to partition eutrophication data into discrete levels. For three eutrophication levels, the classification accuracies of RF and GBDT were 0.8936 and 0.9064, respectively. When using only two eutrophication levels, accuracies for both models increased to 0.9617. This study suggests that ensemble‐learning models, and in particular GBDT (firstly used in eutrophication), show excellent fitting ability for eutrophication compared with other machine‐learning models and provided reliable eutrophication prediction method based on monthly monitoring data.</p>
environmental sciences,water resources
What problem does this paper attempt to address?