An integrated feature selection and hyperparameter optimization algorithm for balanced machine learning models predicting N 2 O emissions from wastewater treatment plants

Mostafa Khalil,Ahmed AlSayed,Yang Liu,Peter A. Vanrolleghem
DOI: https://doi.org/10.1016/j.jwpe.2024.105512
IF: 7
2024-05-26
Journal of Water Process Engineering
Abstract:Nitrous oxide (N 2 O) is a significant contributor to global greenhouse gas emissions, with an increasing global attention for its mitigation. Machine learning (ML) models hold a significant promise as an alternative to mechanistic models in N 2 O prediction from wastewater treatment plants (WWTPs). Although more complex ML models can sometimes enhance performance, they may also yield little to no improvement. Thus, balancing model complexity with performance is essential for effective N 2 O prediction models, an aspect that is often overlooked. Carefully balancing these elements is essential to optimizing model efficacy without unnecessarily increasing complexity. Hence, this study exhaustively investigates the broadly adopted hyperparameter optimization (HPO), grid search optimization, which showed limited ability to consider model complexity and that it only focuses on model performance, leading to potential overfitting. This study emphasizes the crucial balance between model complexity and performance, presenting a new algorithm that combines input feature selection with HPO to enhance model efficiency and accuracy. Consequently, the AdaBoost model achieved the same accuracy as a model crafted through separate feature selection and HPO, holding an R 2 of 0.94 but with a marginal increase in RMSE to 27.25 from 26.27. It simplified the model by using fewer estimators and shallower trees, thereby lowering the risk of overfitting and suggesting better generalizability. The algorithm employs multi-objective optimization with the NSGA-II genetic algorithm (GA), outperforming the Nelder-Mead algorithm. This approach effectively balances model complexity and accuracy, enabling the development of computationally efficient, online tools for N 2 O emission and potentially other wastewater treatment applications.
engineering, chemical, environmental,water resources
What problem does this paper attempt to address?