Hierarchical Reinforcement Learning for Dynamic Autonomous Vehicle Navigation at Intelligent Intersections

Qian Sun,Le Zhang,Huan Yu,Weijia Zhang,Yu Mei,Hui Xiong
DOI: https://doi.org/10.1145/3580305.3599839
2023-01-01
Abstract:Recent years have witnessed the rapid development of the Cooperative Vehicle Infrastructure System (CVIS), where road infrastructures such as traffic lights (TL) and autonomous vehicles (AVs) can share information among each other and work collaboratively to provide safer and more comfortable transportation experience to human beings. While many efforts have been made to develop efficient and sustainable CVIS solutions, existing approaches on urban intersections heavily rely on domain knowledge and physical assumptions, preventing them from being practically applied. To this end, this paper proposes NavTL, a learning-based framework to jointly control traffic signal plans and autonomous vehicle rerouting in mixed traffic scenarios where human-driven vehicles and AVs co-exist. The objective is to improve travel efficiency and reduce total travel time by minimizing congestion at the intersections while guiding AVs to avoid the temporally congested roads. Specifically, we design a graph-enhanced multi-agent decentralized bi-directional hierarchical reinforcement learning framework by regarding TLs as manager agents and AVs as worker agents. At lower temporal resolution timesteps, each manager sets a goal for the workers within its controlled region. Simultaneously, managers learn to take the signal actions based on the observation from the environment as well as an intention information extracted from its workers. At higher temporal resolution timesteps, each worker makes rerouting decisions along its way to the destination based on its observation from the environment, an intention-enhanced manager state representation, and a goal from its present manager. Finally, extensive experiments on one synthetic and two real-world network-level datasets demonstrate the effectiveness of our proposed framework in terms of improving travel efficiency.
What problem does this paper attempt to address?