Abstract:Public transports, such as subway lines and buses, offer affordable ride-sharing services and reduce the road network traffic. Extracting passengers’ preferences from their public transit choices is important to city planners but technically non-trivial. When traveling by taking public transits, passengers make sequences of transit choices, and their rewards are usually influenced by other passengers’ choices. This process can be modeled as a Markov Game (MG). In this paper, we make the first effort to model travelers’ preferences of making transit choices using MGs. Based on the discovery that passengers usually do not change their policies, we propose novel algorithms to extract reward functions from the observed deterministic equilibrium joint policy of all agents in a general-sum MG to infer travelers’ preferences. First, we assume we have the access to the entire joint policy. We characterize the set of all reward functions for which the given joint policy is a Nash equilibrium policy. In order to remove the degeneracy of the solution, we then attempt to pick reward functions so as to maximize the sum of the deviation between the the observed policy and the sub-optimal policy of each agent. This results in a skillfully solvable linear programming algorithm for the multi-agent inverse reinforcement learning (MA-IRL) problem. Then, we deal with the case where we have access to the equilibrium joint policy through a set of actual trajectories. We propose an iterative algorithm inspired by single-agent apprenticeship learning algorithms and the cyclic coordinate descent approach. We evaluate the proposed algorithms on both a simple Grid Game and a unique real-world dataset (from Shenzhen, China). Results show that when we have access to the full policy, our algorithm can efficiently recover most of the reward structure, especially the interaction of agents. In the case where we only have access to a set of sampled expert trajectories, our algorithm can provide an explanation of the expert trajectories. Measured with respect to the experts’ unknown reward function, the performance of the policy output by our algorithm is close to that of the expert policy.

A State-Based Inverse Reinforcement Learning Approach to Model Activity-Travel Choices Behavior with Reward Function Recovery

A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards

Inverse Reinforcement Learning with Unknown Reward Model based on Structural Risk Minimization

Spatial-temporal Pricing for Ride-Sourcing Platform with Reinforcement Learning

A Novel Ensemble Model with Conditional Intervening Opportunities for Ride-Hailing Travel Mobility Estimation

Driving Behavior Modeling Using Naturalistic Human Driving Data With Inverse Reinforcement Learning

An Ensemble Fuzzy Approach for Inverse Reinforcement Learning

Train Trajectory Optimization with High-Risk State Space Boundaries: A Safe Reinforcement Learning Approach

Recursive logit-based meta-inverse reinforcement learning for driver-preferred route planning

A Dynamic Day-To-Day Departure Time and Route Choice Model for Bounded-Rational Individuals

Understanding the Behavioral Effect of Incentives on Departure Time Choice Using Inverse Reinforcement Learning

Modeling and Interpreting Real-world Human Risk Decision Making with Inverse Reinforcement Learning

Inferring Passengers' Interactive Choices on Public Transits Via MA-AL: Multi-Agent Apprenticeship Learning.

AI-Driven Day-to-Day Route Choice

A Day-to-Day Route Choice Model Based on Reinforcement Learning

Personalized Route Recommendation for Ride-Hailing with Deep Inverse Reinforcement Learning and Real-Time Traffic Conditions

Density Matching Reward Learning

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

Dynamic Route and Departure Time Choice Model Based on Self-Adaptive Reference Point and Reinforcement Learning

Enhancing choice-set generation and route choice modeling with data- and knowledge-driven approach

Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning