Contributing factors on the level of delay caused by crashes: a hybrid method of latent class analysis and XGBoost based SHAP algorithm

Zehao Wang,Pengpeng Jiao,Jianyu Wang,Wei Luo,Huapu Lu
DOI: https://doi.org/10.1080/19439962.2023.2189339
2023-03-18
Abstract:Road crashes cause significant traffic delay and bring unnecessary financial losses. This study investigates the impact of contributing factors on the level of delay caused by crashes (LDC) using Texas crash data. To capture the unobserved heterogeneity, a latent class analysis (LCA) was first used to segment the whole dataset into several homogeneous clusters. Then, XGBoost based SHAP was developed on each cluster to identify the main contributing factors hidden in the latent classes. The interaction effects between the contributing factors were subsequently analyzed, including the effects between high importance features and between high and low importance features. The LCA results indicate that season is the main factor producing heterogeneity, hence the data were divided into four clusters. The main contributing factors and the interaction effects are different among the four clusters, as shown by the XGBoost based SHAP algorithm. For example, Sunrise_Sunset, Peak_hours and Crossing are the main contributing factors in Fall and Winter crash, whereas Traffic_Signal, Workday and Junction are the main contributing factors in Summer and Spring crash. The interaction effects of Highway and Zone are different in Fall and Winter crash. This study can provide insightful information for regulators to develop targeted policies in different seasons.
transportation
What problem does this paper attempt to address?