Using an Imbalanced Classification Algorithm and Floating Car Data for Predicting Real-Time Traffic Crash Risk on Expressways

Ahmad Yehia,Xuesong Wang,Tonggen Wang
DOI: https://doi.org/10.2139/ssrn.3994300
2021-01-01
SSRN Electronic Journal
Abstract:Real-time traffic crash risk prediction plays a vital role in active traffic management (ATM) systems by identifying the hazardous traffic conditions that tend to precede a crash in the short interval prior to the crash. Recent advancements in traffic sensing and detection technologies have made real-time analysis possible, but few prior studies have examined crash occurrence using real-time floating car data (FCD) collected from expressways. Moreover, real-time crash risk prediction models have mostly been developed from regenerated balanced datasets, which may be inadequate for the large and continuous real-time traffic data environment. Therefore, in this study, a comprehensive imbalanced classification algorithm, the adaptive boosting algorithm for convolutional neural networks (AdaBoost-CNN), has been used to build a practical real-time traffic crash risk prediction model. This study primarily aims to (1) investigate the feasibility of using FCD to predict real-time crash risk on expressways, and (2) explore the efficiency of the AdaBoost-CNN algorithm to solve the imbalanced data classification problem in predicting real-time crash risk. Two models are compared to the proposed AdaBoost-CNN. First, AdaBoost with CNN base classifiers is compared to the proposed model to investigate the influence of transfer learning on prediction accuracy. Second, a one-dimensional conventional neural network (1-DCNN) is developed with balanced data to examine how well AdaBoost-CNN can handle the large imbalanced distribution of crash and non-crash datasets. These experiments demonstrated that AdaBoost-CNN is accurate enough to be useful in predicting crash and non-crash cases using large amounts of imbalanced continuous real-time floating car data, as measured by sensitivity, false alarm rate, and area under the curve scores. Third, a map-matching approach was introduced to analyze FCD, which showed high accuracy in transforming FCD into specific traffic characteristics for the corresponding segments. The results confirm the feasibility of the proposed model for predicting real-time crash risk on expressways.
What problem does this paper attempt to address?