A Combined Data Preprocessing Method Based on K-means Clustering and Singular Spectrum Analysis

Ya Yao,Baoliang Wang,Zhiyao Huang,Haifeng Ji,Haiqing Li
DOI: https://doi.org/10.1109/icetce.2012.7
2012-01-01
Abstract:The original data collected by the online monitoring system are often cluttered and redundant, which result in the low performance of water quality prediction. A data preprocessing method is introduced in this research to solve the problem. There are 2 steps in this method. Firstly, k-means clustering is used to classify monitoring data into several clusters. Each of the clusters has a representative data value. The results can be considered containing all the information of water quality of a specific period of time. Secondly, singular spectrum analysis (SSA) is used to capture significant component of classified data of the specific period of time. To evaluate the performance of the data preprocess method, the back propagation artificial neural networks(BP ANN) is introduced as the prediction model. The preprocessed data of the specific period of time is set as the input of BP ANN, while the output data is the forecast result. The result is compared with prediction without data preprocessing algorithm. It shows that the prediction with data preprocessing method performs better.
What problem does this paper attempt to address?