Phase Re-service in Reinforcement Learning Traffic Signal Control

Zhiyao Zhang,George Gunter,Marcos Quinones-Grueiro,Yuhang Zhang,William Barbour,Gautam Biswas,Daniel Work
2024-08-02
Abstract:This article proposes a novel approach to traffic signal control that combines phase re-service with reinforcement learning (RL). The RL agent directly determines the duration of the next phase in a pre-defined sequence. Before the RL agent's decision is executed, we use the shock wave theory to estimate queue expansion at the designated movement allowed for re-service and decide if phase re-service is necessary. If necessary, a temporary phase re-service is inserted before the next regular phase. We formulate the RL problem as a semi-Markov decision process (SMDP) and solve it with proximal policy optimization (PPO). We conducted a series of experiments that showed significant improvements thanks to the introduction of phase re-service. Vehicle delays are reduced by up to 29.95% of the average and up to 59.21% of the standard deviation. The number of stops is reduced by 26.05% on average with 45.77% less standard deviation.
Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce vehicle delays and the number of stops at intersections by enhancing adaptive traffic signal control (ATSC) based on reinforcement learning (RL) in high - traffic left - turn demand scenarios. Specifically, the author proposes a method that combines phase re - service with reinforcement learning to manage traffic flow more flexibly, especially when the demand for left - turn lanes surges during peak hours, which can effectively relieve traffic congestion and reduce vehicle waiting time. ### Main contributions of the paper 1. **Introduction of phase re - service**: A new mechanism, phase re - service, is added to traditional reinforcement - learning - based traffic signal control, that is, a specific phase (such as protected left - turn) is repeatedly served within a cycle. This helps to clear the left - turn queue more effectively, especially in high - demand situations. 2. **Estimation of queue growth using shock wave theory**: To determine whether phase re - service is required, the author uses shock wave theory to estimate the queue expansion in a specified movement direction. If the estimation result shows that the queue growth exceeds the preset threshold, a temporary phase re - service will be inserted. 3. **Semi - Markov decision process (SMDP) modeling**: Since the RL agent selects the duration of each phase, the author models the control problem as a semi - Markov decision process (SMDP) and solves it through the proximal policy optimization (PPO) algorithm. ### Experimental results The experimental results show that this method can significantly reduce vehicle delays and the number of stops in multiple traffic demand scenarios. The specific data are as follows: - **Average vehicle delay**: Reduced by up to 29.95% at most, and the standard deviation is reduced by up to 59.21% at most. - **Average number of stops**: Reduced by up to 26.05% at most, and the standard deviation is reduced by up to 45.77% at most. These improvements are not only reflected in the overall performance but are also particularly significant for protected left - turn movements. ### Conclusion The method proposed in the paper effectively improves the flexibility and efficiency of traffic signal control by combining reinforcement learning and phase re - service, especially when dealing with high - traffic left - turn demands. The experimental results verify the effectiveness of this method and provide new ideas and technical support for future traffic signal control systems.