A hybrid clustering and random forest model to analyse vulnerable road user to motor vehicle (VRU-MV) crashes

Zhiyuan Sun,Duo Wang,Xin Gu,Yuxuan Xing,Jianyu Wang,Huapu Lu,Yanyan Chen
DOI: https://doi.org/10.1080/17457300.2023.2180804
Abstract:The main goal of this study is to investigate the unobserved heterogeneity in VRU-MV crash data and to determine the relatively important contributing factors of injury severity. For this end, a latent class analysis (LCA) coupled with random parameters logit model (LCA-RPL) is developed to segment the VRU-MV crashes into relatively homogeneous clusters and to explore the differences among clusters. The random-forest-based SHapley Additive exPlanation (RF-SHAP) approach is used to explore the relative importance of the contributing factors for injury severity in each cluster. The results show that, vulnerable group (VG), intersection or not (ION) and road type (RT) clearly distinguish the crash clusters. Moto-vehicle type and functional zone have significant impact on the injury severity among all clusters. Several variables (e.g. ION, crash type [CT], season and RT) demonstrate a significant effect in a specific sub-cluster model. Results of this study provide specific and insightful countermeasures that target the contributing factors in each cluster for mitigating VRU-MV crash injury severity.
What problem does this paper attempt to address?