Reinforcement Learning for Shortest Path Problem on Stochastic Time-Dependent Road Network

Ke Zhang,Meng Li,Yu Shan
DOI: https://doi.org/10.1061/9780784483565.040
2021-01-01
Abstract:Finding a shortest path between two locations on a stochastic time-dependent road network is an important constituent in vehicle guidance system. However, it is difficult for traditional heuristic algorithms to handle the complexity and stochasticity within the road network. In this paper, we model the stochastic time-dependent routing problem as a Markov decision process and utilize several reinforcement learning methods to solve this problem, such as Sarsa, Q-learning and Double Q-learning method. Sarsa method uses the actual Q-value for iteration instead of the maximum value function used by Q-Learning, while Double Q-learning utilizes two estimators to compute the value function, which can overcome the shortcoming of overestimation. Evaluated on ten stochastic time-dependent road networks, it can be concluded that Double Q-learning method outperforms other methods. Finally, the optimal paths acquired at different epochs are visualized to display the process of agent exploration.
What problem does this paper attempt to address?