Abstract:One of the fundamental prerequisites for interpreting collected passive travel data to develop intelligent transportation systems is the detection of transportation modes. The literature has divided transportation mode detection into two parts: feature extraction and implementation of classification models. Selecting and employing influential features will help maximize the power of the classification model. Meanwhile, the interpretation and identification of influential features, which will be the focus of this study, have received less attention. Importantly, the influence of features varies depending on the nature of the input data and the choice of classification models. In many cases, the extracted features show interdependence, where their combined correlation significantly impacts specific outcomes. Consequently, evaluating the effectiveness of individual features in isolation may not produce accurate results, requiring the exploration of alternative methodologies. This study seeks to bridge these gaps through a comprehensive investigation. Three open-source datasets, Geolife, MTL Trajet 2017, and MTL Trajet 2016, were utilized to enhance reliability, validate the approach, and investigate the variability of influential features under various data collection conditions. Originally, various features were extracted and grouped for this purpose based on their kinematic, spatial, and contextual features. Then, three powerful classification models (Random Forest, LightGBM, and XGBoost) were utilized. A hybrid feature selection algorithm was employed to select a subset of features to analyze the variability of influential features across different classification models. The algorithm removed over half of the features with minimal or negative impact, thereby simplifying the process of classification identification. Since the features when combined in the form of a subset, would result in powerful identification, the influence of the features was analyzed within a set of features instead of analyzing each feature individually. Two approaches, "number of feature repetitions" and "Shapley Additive Explanations (SHAP) value," were adopted to interpret the computation. After implementation, the "average velocity" with repetition in all datasets and classification models (nine repetitions) had the highest SHAP value, making it the most influential feature across all datasets and classification models. The "public stations indicator" was the most influential spatial feature with the highest SHAP value, appearing nine times, while "holiday" had the most repetitions among the contextual features.

Enhancing transport mode classification benchmark by integrating spatial independence with multimodal dataset

An approach to assess the role of features in detection of transportation modes

A hybrid method for intercity transport mode identification based on mobility features and sequential relations mined from cellular signaling data

A multi‐modal transportation data‐driven approach to identify urban functional zones: An exploration based on Hangzhou City, China

Exploring the association between multi-mode transport and the built environment: A comparative study of metro, bus, taxi, and shared bike use

Dataset for multimodal transport analytics of smartphone users - Collecty

Multimodal urban mobility and multilayer transport networks

Travel Mode Identification for Non-Uniform Passive Mobile Phone Data

Intercity Traffic Travel Mode Identification Method Based on Mobile Signalling Data

Identifying Transportation Modes Using Gradient Boosting Decision Tree

Unsupervised Learning for Topological Classification of Transportation Networks

Estimator: an Effective and Scalable Framework for Transportation Mode Classification over Trajectories

Transportation Mode Detection Combining CNN and Vision Transformer with Sensors Recalibration Using Smartphone Built-In Sensors

A Data-Based Bi-Objective Approach to Explore the Accessibility of Multimodal Public Transport Networks

User-based representation of time-resolved multimodal public transportation networks

Urban Mobility Pattern Detection: Development of a Classification Algorithm Based on Machine Learning and GPS

A Novel One-Stage Approach for Pointwise Transportation Mode Identification Inspired by Point Cloud Processing

Combining data from multiple sources for urban travel mode choice modelling

Modeling Heterogeneity in Mode-Switching Behavior Under a Mobility-on-Demand Transit System: An Interpretable Machine Learning Approach

Using Smart Phone Sensors to Detect Transportation Modes

A data-driven travel mode share estimation framework based on mobile device location data