PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction under Imbalanced Data.

Jintao Ke,Shuaichao Zhang,Hai Yang,Xiqun (Michael) Chen
DOI: https://doi.org/10.1080/23249935.2018.1542414
2018-01-01
Transportmetrica A Transport Science
Abstract:As an important research topic, real-time crash likelihood prediction has been studied for many years. However, few research focuses on the missing data imputation in real-time crash likelihood prediction, although missing values are commonly observed due to breakdown of sensors or external interference. Besides, classifying imbalanced data is also a critical issue in real-time crash likelihood prediction, since the number of crash-prone cases is much smaller than that of non-crash cases. In this paper, three principal component analysis (PCA) based approaches are established for imputing missing values, while two kinds of solutions are developed to tackle the issue of imbalanced data. The results show that the proposed methods can help the classifiers achieve better predictive performance under situations with missing data. The two solutions, i.e. cost-sensitive learning, and synthetic minority oversampling technique (SMOTE), can help improve the sensitivity by adjusting the classifiers to pay more attention to the minority class.
What problem does this paper attempt to address?