An approach to assess the role of features in detection of transportation modes
Sajjad Sowlati,Rahim Ali Abbaspour,Alireza Chehreghan
DOI: https://doi.org/10.1007/s11116-024-10492-7
IF: 4.814
2024-05-19
Transportation
Abstract:One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, "number of feature repetitions" and "Shapley Additive Explanations (SHAP) value," were adopted to interpret the computation. After implementation, the "average velocity" with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The "public stations indicator" was the most influential spatial feature with the highest SHAP value, appearing nine times, while "holiday" had the most repetitions among the contextual features.
transportation,transportation science & technology,engineering, civil