A BPCA Based Missing Value Imputation and Its Impact on Traffic Incident Prediction

Huiping Li,Yinhai Wang,Meng Li
DOI: https://doi.org/10.1061/9780784481523.177
2018-01-01
Abstract:To improve the level of road safety, many traffic incident prediction models utilizing machine learning methods are proposed. However, traffic data applied to make a prediction is often not complete, and few studies are devoted to its potential impact on the prediction accuracy. In this study, several state-of-the-art machine learning methods like extreme gradient boosting, random forest, and support vector machine are adopted to make traffic incident predictions. 123 traffic incidents, and 5 months of microwave data on an urban expressway are collected. The missing pattern in our data is discussed and imputed by 3 methods: mean interpolation, probabilistic principal component analysis (PPCA), and Bayesian principal component analysis (BPCA). A sensitivity analysis is carried out under different missing rates. The numerical test revealed that BPCA performs slightly better than PPCA, but both produce higher and more stable prediction accuracy compared with mean interpolation, especially when ensemble learning techniques are adopted.
What problem does this paper attempt to address?