Statistical inference of travelers' route choice preferences with system-level data

Pablo Guarda,Sean Qian
DOI: https://doi.org/10.48550/arXiv.2204.10964
2022-04-23
Abstract:Traditional network models encapsulate travel behavior among all origin-destination pairs based on a simplified and generic utility function. Typically, the utility function consists of travel time solely and its coefficients are equated to estimates obtained from stated preference data. While this modeling strategy is reasonable, the inherent sampling bias in individual-level data may be further amplified over network flow aggregation, leading to inaccurate flow estimates. This data must be collected from surveys or travel diaries, which may be labor intensive, costly and limited to a small time period. To address these limitations, this study extends classical bi-level formulations to estimate travelers' utility functions with multiple attributes using system-level data. We formulate a methodology grounded on non-linear least squares to statistically infer travelers' utility function in the network context using traffic counts, traffic speeds, traffic incidents and sociodemographic information, among other attributes. The analysis of the mathematical properties of the optimization problem and of its pseudo-convexity motivate the use of normalized gradient descent. We also develop a hypothesis test framework to examine statistical properties of the utility function coefficients and to perform attributes selection. Experiments on synthetic data show that the coefficients are consistently recovered and that hypothesis tests are a reliable statistic to identify which attributes are determinants of travelers' route choices. Besides, a series of Monte-Carlo experiments suggest that statistical inference is robust to noise in the Origin-Destination matrix and in the traffic counts, and to various levels of sensor coverage. The methodology is also deployed at a large scale using real-world multi-source data in Fresno, CA collected before and during the COVID-19 outbreak.
Applications,Machine Learning,Optimization and Control,Physics and Society
What problem does this paper attempt to address?
This paper attempts to solve the following problems: 1. **How to use system - level data to infer the coefficient of travelers' utility functions**: Traditional methods usually rely on individual - level data (such as surveys or travel diaries) to estimate travelers' utility functions, but these data may have sampling biases, be costly and time - limited. This paper proposes a method based on system - level data (such as traffic flow, speed, number of events and sociodemographic information) to estimate travelers' utility functions more accurately and efficiently. 2. **Solve the estimation problem of multi - attribute utility functions**: Most of the existing research focuses on utility functions that only depend on travel time, ignoring other factors that may affect route selection (such as monetary cost, reliability, etc.). This paper extends the traditional model, allowing the utility function to contain multiple attributes, and infers the weights of these attributes by optimizing system - level data through the nonlinear least - squares method. 3. **Meet the challenges of pseudo - convexity and non - convexity in optimization problems**: The estimation of utility function coefficients involves a bilevel optimization problem, whose objective is to simultaneously minimize the gap between the estimated flow and the observed flow, and consider the endogenous effect of traffic congestion on travelers' route selection. Due to the pseudo - convexity of the optimization problem, this paper adopts the normalized gradient descent algorithm to solve this problem. 4. **Provide a statistical hypothesis testing framework**: In order to verify the importance of each attribute in the utility function, this paper develops a hypothesis testing framework to evaluate the statistical properties of coefficient estimation and perform attribute selection. This helps to identify which factors really affect travelers' route selection. ### Specific Objectives - Propose a method to statistically infer the coefficients of travelers' multi - attribute utility functions using system - level data. - Analyze the non - convexity of the utility function coefficient estimation problem and prove the effectiveness of combining first - order and second - order optimization methods. - Construct a statistical framework for hypothesis testing and attribute selection. - Verify the effectiveness of the proposed method in large - scale real - world traffic networks. ### Core Contributions 1. **Methodological Innovation**: For the first time, a statistical inference method for multi - attribute utility function coefficients based on system - level data is proposed. 2. **Mathematical Analysis**: For the first time, a mathematical analysis of the non - convexity of the utility function coefficient estimation problem is carried out. 3. **Optimization Algorithm Improvement**: Demonstrate the superiority of combining first - order and second - order optimization methods in solving this problem. 4. **Statistical Tool**: Develop a hypothesis testing framework to evaluate the importance of each attribute in the utility function. 5. **Practical Application**: Successfully deploy this method in large - scale real - traffic networks and verify it using real system - level data. ### Summary The core problem of this paper is: how to use system - level data to more accurately estimate the coefficients of travelers' multi - attribute utility functions and overcome the sampling bias and high cost problems in traditional methods. Through the bilevel optimization procedure, the normalized gradient descent algorithm and the statistical hypothesis testing framework, this paper provides an innovative solution to this problem and has achieved remarkable results at both the theoretical and practical levels.