Virtual sample generation empowers machine learning-based effluent prediction in constructed wetlands

Qiyu Dong,Shunwen Bai,Zhen Wang,Xinyue Zhao,Shanshan Yang,Nanqi Ren
DOI: https://doi.org/10.1016/j.jenvman.2023.118961
IF: 8.7
2023-11-01
Journal of Environmental Management
Abstract:The design of constructed wetlands (CWs) is critical to ensure effective wastewater treatment. However, limited availability of reliable data can hamper the accuracy of CW effluent predictions, thus increasing design costs and time. In this study, a novel effluent prediction framework for CWs is proposed, utilizing data dimensionality reduction and virtual sample generation. By using four the machine learning algorithms (Cubist, random forest, support vector regression, and extreme learning machine), important features of CW design are identified and used to build prediction models. The extreme learning machine algorithm achieved the highest determination coefficient and lowest error, identifying it as the most suitable algorithm for effluent prediction. A multi-distribution mega-trend-diffusion algorithm with particle swarm optimization was employed to generate virtual samples. These virtual samples were then combined with real samples to retrain the prediction model and verify the optimization effect. Comparative analysis demonstrated that the integration of virtual samples significantly improved the prediction accuracy for ammonium and chemical oxygen demand. The root mean square error decreased by averages of 60.5% and 42.1%, respectively, and the mean absolute percentage error by averages of 21.5% and 23.8%, respectively. Finally, a CW design process is proposed based on prediction models and virtual samples. This integrated forward prediction and reverse design tool can efficiently support CW design when sample sizes are limited, ultimately leading to more accurate and cost-effective design solutions.
environmental sciences
What problem does this paper attempt to address?