Machine learning for modeling N 2 O emissions from wastewater treatment plants: Aligning model performance, complexity, and interpretability

Mostafa Khalil,Ahmed AlSayed,Yang Liu,Peter A. Vanrolleghem
DOI: https://doi.org/10.1016/j.watres.2023.120667
IF: 12.8
2023-09-30
Water Research
Abstract:Nitrous oxide (N 2 O) emissions may account for up to 80 % of a wastewater treatment plant's (WWTP) total carbon footprint. Given the complexity of the pathways involved, estimating N 2 O emissions through mechanistic models still often fails to precisely depict process dynamics. Alternatively, data-driven methods for predicting N 2 O emissions hold substantial potential. However, so far, a comprehensive approach is still overlooked, impeding the advancement of full-scale application. Therefore, this study develops a comprehensive approach for using machine learning to perform online process modeling of N 2 O emissions. The approach is tested on a long-term N 2 O emission dataset from a full-scale WWTP. Uniquely, the proposed approach emphasizes not just model accuracy, but it also considers model complexity, computational speed, and interpretability, equipping operators with the insights needed for informed corrective actions. Algorithms with varying levels of complexity and interpretability including k-Nearest Neighbors (kNN), decision trees, ensemble learning models, and deep neural networks (DNN) were considered. Furthermore, a parametric multivariate outlier removal method was adjusted to account for data statistical distributions, significantly reducing data loss. By employing an effective feature selection methodology, a trade-off between data acquisition, model performance, and complexity was found, reducing the number of features by 40 % and decreasing data collection cost, model complexity and computational burden without significant effect on modeling accuracy. The best performing models are kNN (R 2 = 0.88), AdaBoost (R 2 = 0.94), and DNN (R 2 = 0.90). Feature importance of models was analyzed and compared with process knowledge to test interpretability, guiding N 2 O mitigation decisions.
environmental sciences,engineering, environmental,water resources
What problem does this paper attempt to address?