Improving Power Grid Monitoring Data Quality: an Efficient Machine Learning Framework for Missing Data Prediction

Weiwei Shi,Yongxin Zhu,Jinkui Zhang,Xiang Tao,Gehao Sheng,Yong Lian,Guoxing Wang,Yufeng Chen
DOI: https://doi.org/10.1109/hpcc-css-icess.2015.16
2015-01-01
Abstract:Big data techniques has been applied to power grid for the evaluation and prediction of grid conditions. However, the raw data quality rarely can meet the requirement of precise data analytics since raw data set usually contains samples with missing data to which the common data mining models are sensitive. Though classic interpolation or neural network methods can been used to fill the gaps of missing data, their predicted data often fail to fit the rules of power grid conditions. This paper presents a machine learning framework (OR_MLF) to improve the prediction accuracy for datasets with missing data points, which mainly combines preprocessing, optimizing support vector machine (OSVM) and refining SVM (RSVM). On top of the OSVM engine, the scheme introduces dedicated data training strategies. First, the original data originating from data generation facilities is preprocessed through standardization. Traditional SVM is then trained to obtain a preliminary prediction model. Next, the optimized SVM predictors are achieved with new training data set, which is extracted based on the preliminary prediction model. Finally, the missing data prediction result depending on OSVM is selectively inputted into the traditional SVM and the refined SVM is lastly accomplished. We test the OR_MLF framework on missing data prediction of power transformers in power grid system. The experimental results show that the predictors based on the proposed framework achieve lower mean square error than traditional ones. Therefore, the framework OR_MLF would be a good candidate to predict the missing data in power grid system.
What problem does this paper attempt to address?