The Level of Delay Caused by Crashes (LDC) in Metropolitan and Non-Metropolitan Areas: a Comparative Analysis of Improved Random Forests and LightGBM

Zehao Wang,Pengpeng Jiao,Jianyu Wang,Qiong Huang,Rujian Li,Huapu Lu
DOI: https://doi.org/10.1080/13588265.2022.2130624
IF: 1.472
2023-01-01
International Journal of Crashworthiness
Abstract:Traffic crashes cause serious traffic delay and have some unobserved heterogeneity in different areas. Using Texas accident data in 2020, this article aims to predict the level of delay caused by crashes (LDC) accurately and efficiently and discuss the difference between metropolitan and non-metropolitan areas. A framework based on Random Forests (RF) and LightGBM (LGBM) is developed to measure the association between LDC and its possible risk factors. At first, the most relevant variables in different areas were recognised through recursive feature elimination based on logistic regression. Then, LDC were forecasted by classifiers after grid search hyper parameters. To resolve data imbalance, two threshold moving methods of maximisation G-mean and F1-score were used. Finally, SHapley Additive explanation was employed to interpret the best model. The results indicate that the improved RF performs better in metropolitan areas and the improved LGBM performs better in non-metropolitan areas. In addition, Highway, spring and sunrise are the main risk factors of higher LDC in the two areas. And excessive wind speed and temperature in metropolitan areas can lead to higher LDC while in non-metropolitan areas it is pressure and apparent temperature.
What problem does this paper attempt to address?