A machine learning framework to improve effluent quality control in wastewater treatment plants

Dong Wang,Sven Thunéll,Ulrika Lindberg,Lili Jiang,Johan Trygg,Mats Tysklind,Nabil Souihi
DOI: https://doi.org/10.1016/j.scitotenv.2021.147138
2021-08-01
Abstract:<p>Due to the intrinsic complexity of wastewater treatment plant (WWTP) processes, it is always challenging to respond promptly and appropriately to the dynamic process conditions in order to ensure the quality of the effluent, especially when operational cost is a major concern. Machine Learning (ML) methods have therefore been used to model WWTP processes in order to avoid various shortcomings of conventional mechanistic models. However, to the best of the authors' knowledge, no ML applications have focused on investigating how operational factors can affect effluent quality. Additionally, the time lags between process steps have always been neglected, making it difficult to explain the relationships between operational factors and effluent quality. Therefore, this paper presents a novel ML-based framework designed to improve effluent quality control in WWTPs by clarifying the relationships between operational variables and effluent parameters. The framework consists of Random Forest (RF) models, Deep Neural Network (DNN) models, Variable Importance Measure (VIM) analyses, and Partial Dependence Plot (PDP) analyses, and uses a novel approach to account for the impact of time lags between processes. Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden involving a large number of samples (105763) representing the full scale of the plant's operations. Two effluent parameters, Total Suspended Solids in effluent (TSS<sub>e</sub>) and Phosphate in effluent (PO4<sub>e</sub>), and thirty-two operational variables are studied. RF models are developed, validated using DNN models as references, and shown to be suitable for VIM and PDP analyses. VIM identifies the variables that most strongly influence TSS<sub>e</sub> and PO4<sub>e</sub>, while PDP elucidates their specific effects on TSS<sub>e</sub> and PO4<sub>e</sub>. The major findings are: (1) Influent temperature is the most influential variable for both TSS<sub>e</sub> and PO4<sub>e</sub>, but it affects them in different ways; (2) PO4<sub>e</sub> depends strongly on the TSS in aeration basins – higher TSS concentrations in aeration basins generally promote PO<sub>4</sub> removal, but excess TSS can have negative effects; (3) In general, the impact of TSS in aeration basins on TSS<sub>e</sub> and PO4<sub>e</sub> increases with the distances of the basin from the merging outlet, so more attention should be paid to the TSS concentration in the third or fourth aeration basins than the first and second ones; (4) Returning excessive amounts of sludge through the second return sludge pipe should be avoided because of its adverse impact on TSS<sub>e</sub> removal. These results could support the development of more advanced control strategies to increase control precision and reduce running costs in the Umeå WWTP and other similarly configured WWTPs. The framework could also be applied to other parameters in WWTPs and industrial processes in general if sufficient high-resolution data are available.</p>
environmental sciences
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses the following three key issues: 1. **Optimizing Wastewater Treatment Plant (WWTP) Control Strategies**: In a wastewater treatment plant in Umeå, Sweden, existing methods are inefficient in optimizing the use of energy and chemicals, mainly relying on the "trial and error" approach. The study aims to develop an advanced control strategy to optimize the utilization of these resources while ensuring effluent quality. 2. **Using Machine Learning to Understand the Impact of Process Variables on Effluent Quality**: Most existing machine learning applications focus only on prediction or building soft sensors and do not delve into how adjusting operational variables can improve effluent quality. This paper proposes a framework based on the Random Forest (RF) model to explain how different process factors affect the final effluent quality, particularly the concentrations of Total Suspended Solids (TSS) and Phosphate (PO4). 3. **Addressing Time Lag Issues in Process Operations**: In the wastewater treatment process, there are time lags between various steps, but previous studies often overlook this. This paper proposes a new method to consider these time lags, thereby more accurately explaining the model output results and avoiding erroneous analysis due to the neglect of time lags. In summary, this research aims to develop a novel machine learning framework that not only improves the control accuracy of effluent quality in wastewater treatment plants but also provides a theoretical basis for formulating more advanced control strategies.