Abstract:This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel cost-nothing particular up to now-you could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policy-minimizing the expected routing cost-are derived. Iterating these necessary conditions, reminiscent of Bellman's value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsu's model ( 1996 ), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsu's model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.

Markov decision process routing games

Congestion-aware path coordination game with Markov decision process dynamics

Adaptive Constraint Satisfaction for Markov Decision Process Congestion Games: Application to Transportation Networks

To Optimize Human-in-the-loop Learning in Repeated Routing Games

A Network Game of Dynamic Traffic

Uncertain Congestion Games with Assorted Human Agent Populations

Re-routing game: The inadequacy of mean-field approach in modeling the herd behavior in path switching

Social Learning in Nonatomic Routing Games

A traveler-centric mobility game: Efficiency and stability under rationality and prospect theory

Solving N-player dynamic routing games with congestion: a mean field approach

Routing and charging game in ride-hailing service with electric vehicles

Modeling Decision Process in Multi-Agent Systems: A Graphical Markov Game Based Approach

Urgency-aware Optimal Routing in Repeated Games through Artificial Currencies

Optimal dynamic information provision in traffic routing

Optimal Routing for Delay-Sensitive Traffic in Overlay Networks

From Altruism to Non-Cooperation in Routing Games

Randomized shortest-path problems: two related models

A Game-Theoretic Approach to Stimulate Cooperation for Probabilistic Routing in Opportunistic Networks

Nash equilibria in routing games with edge priorities

Dynamic Routing in Stochastic Urban Air Mobility Networks: A Markov Decision Process Approach

How bad is selfish routing in practice?