Source Identification and Prediction of Nitrogen and Phosphorus Pollution of Lake Taihu by an Ensemble Machine Learning Technique

Yirong Hu,Wenjie Du,Cheng Yang,Yang Wang,Tianyin Huang,Xiaoyi Xu,Wenwei Li
DOI: https://doi.org/10.1007/s11783-023-1655-7
IF: 6.048
2023-01-01
Frontiers of Environmental Science & Engineering
Abstract:Effective control of lake eutrophication necessitates a full understanding of the complicated nitrogen and phosphorus pollution sources, for which mathematical modeling is commonly adopted. In contrast to the conventional knowledge-based models that usually perform poorly due to insufficient knowledge of pollutant geochemical cycling, we employed an ensemble machine learning (ML) model to identify the key nitrogen and phosphorus sources of lakes. Six ML models were developed based on 13 years of historical data of Lake Taihu’s water quality, environmental input, and meteorological conditions, among which the XGBoost model stood out as the best model for total nitrogen (TN) and total phosphorus (TP) prediction. The results suggest that the lake TN is mainly affected by the endogenous load and inflow river water quality, while the lake TP is predominantly from endogenous sources. The prediction of the lake TN and TP concentration changes in response to these key feature variations suggests that endogenous source control is a highly desirable option for lake eutrophication control. Finally, one-month-ahead prediction of lake TN and TP concentrations ( R 2 of 0.85 and 0.95, respectively) was achieved based on this model with sliding time window lengths of 9 and 6 months, respectively. Our work demonstrates the great potential of using ensemble ML models for lake pollution source tracking and prediction, which may provide valuable references for early warning and rational control of lake eutrophication.
What problem does this paper attempt to address?