Early Abnormal Detection of Sewage Pipe Network: Bagging of Various Abnormal Detection Algorithms

Zhen-Yu Zhang,Guo-Xiang Shao,Chun-Ming Qiu,Yue-Jie Hou,En-Ming Zhao,Chi-Chun Zhou
DOI: https://doi.org/10.48550/arXiv.2206.03321
2022-06-06
Abstract:Abnormalities of the sewage pipe network will affect the normal operation of the whole city. Therefore, it is important to detect the abnormalities early. This paper propose an early abnormal-detection method. The abnormalities are detected by using the conventional algorithms, such as isolation forest algorithm, two innovations are given: (1) The current and historical data measured by the sensors placed in the sewage pipe network (such as ultrasonic Doppler flowmeter) are taken as the overall dataset, and then the general dataset is detected by using the conventional anomaly detection method to diagnose the anomaly of the data. The anomaly refers to the sample different from the others samples in the whole dataset. Because the definition of anomaly is not through the algorithm, but the whole dataset, the construction of the whole dataset is the key to propose the early abnormal-detection algorithms. (2) A bagging strategy for a variety of conventional anomaly detection algorithms is proposed to achieve the early detection of anomalies with the high precision and recall. The results show that this method can achieve the early anomaly detection with the highest precision of 98.21%, the recall rate 63.58% and F1-score of 0.774.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the early anomaly detection of urban sewage pipe networks. Specifically, anomalies in the sewage pipe network will affect the normal operation of the entire city and may lead to a series of problems, such as pipe explosions and environmental pollution. Therefore, timely detection of these anomalies is crucial to avoid potential problems. ### Main contributions and innovations of the paper: 1. **Construction of the overall dataset**: - The author proposes to use the current and historical data measured by sensors (such as ultrasonic Doppler flowmeters) in the sewage pipe network as the overall dataset and use traditional anomaly detection methods to detect these data. The definition of anomalies here is based on the sample differences in the entire dataset, rather than being defined by specific algorithms. Therefore, the construction of the overall dataset is the key to achieving early anomaly detection. 2. **Integration of multiple anomaly detection algorithms (Bagging strategy)**: - The author introduces the Bagging strategy in ensemble learning and combines multiple traditional anomaly detection algorithms (such as Isolation Forest, One Class SVM, and Local Outlier Factor) to improve the precision and recall rate of anomaly detection. The experimental results show that this method achieves a precision of 98.21%, a recall rate of 63.58%, and an F1 - score of 0.774. ### Specific content of the paper: - **Introduction part**: It introduces the importance of the sewage pipe network and the impact of its anomalies, points out the limitations of hardware detection methods, and proposes an improvement plan based on statistical or machine - learning methods. - **Anomaly types**: It summarizes five common types of sewage pipe network anomalies, including external water source mixing, flowmeter anomalies, pipe network blockages, data transmission interruptions caused by signal fluctuations, and full - pipe states. - **Data introduction and pre - processing**: It describes 13,830 sets of typical data collected from 136 monitoring points around Erhai Lake and the methods of data pre - processing, including the application of the sliding window technique. - **Method part**: It details three commonly used anomaly detection algorithms (One Class SVM, Isolation Forest, and Local Outlier Factor) and proposes an integrated algorithm based on Bagging. - **Experimental results**: By comparing the experimental results under different parameter settings, it verifies the effectiveness of the proposed method. In particular, when N = 5 and P = 5, the performance of the model is the best. - **Conclusion and outlook**: It summarizes the achievements of this research and proposes future research directions, such as further optimizing anomaly classification and early - warning algorithms and improving early - warning precision. In summary, this paper aims to provide an efficient and low - cost early - detection method for sewage pipe network anomalies through constructing the overall dataset and adopting the ensemble learning strategy, thereby ensuring the normal operation of urban sewage pipe networks.