Learning scalable multi-agent coordination by spatial differentiation for traffic signal control

Junjia Liu,Huimin Zhang,Zhuang Fu,Yao Wang
DOI: https://doi.org/10.1016/j.engappai.2021.104165
IF: 8
2021-04-01
Engineering Applications of Artificial Intelligence
Abstract:<p>The intelligent control of the traffic signal is critical to the optimization of transportation systems. To achieve global optimal traffic efficiency in large-scale road networks, recent works have focused on coordination among intersections, which have shown promising results. However, existing studies paid more attention to observations sharing among intersections (both explicit and implicit) and did not care about the consequences after decisions. In this paper, we design a multi-agent coordination framework based on Deep Reinforcement Learning method for traffic signal control, defined as <span class="math"><math>γ</math></span>-<em>Reward</em> that includes both original <span class="math"><math>γ</math></span>-<em>Reward</em> and <span class="math"><math>γ</math></span>-<em>Attention-Reward</em>. Specifically, we propose the <em>Spatial Differentiation</em> method for coordination which uses the temporal–spatial information in the replay buffer to amend the reward of each action. A concise theoretical analysis that proves the proposed model can converge to Nash equilibrium is given. By extending the idea of Markov Chain to the dimension of space–time, this truly decentralized coordination mechanism replaces the graph attention method and realizes the decoupling of the road network, which is more scalable and more in line with practice. The simulation results show that the proposed model remains a state-of-the-art performance even not use a centralized setting. Code is available in <a href="https://github.com/Skylark0924/Gamma_Reward">https://github.com/Skylark0924/Gamma_Reward</a>.</p>
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of intelligent traffic signal control in large - scale road networks, especially how to achieve coordinated control among multiple intersections to reach the globally optimal traffic efficiency. Existing research mainly focuses on the sharing of observational information (including explicit and implicit) between intersections, but pays less attention to the consequences after decision - making. This paper proposes a multi - agent coordination framework based on deep reinforcement learning (DRL), defined as **𝛾 - Reward**, including the original **𝛾 - Reward** and **𝛾 - Attention - Reward**. ### Specific problem descriptions 1. **Traffic congestion problems**: - Traffic congestion not only increases commuting time, but also aggravates noise and environmental pollution problems. - Most collisions and delays in urban traffic are concentrated at intersections, and unreasonable signal control will lead to a waste of traffic resources. - Therefore, the key to solving urban congestion lies in keeping intersections unobstructed. 2. **Multi - intersection coordinated control problems**: - The intelligent regulation of large - scale road networks requires the realization of coordinated control among intersections, which can be regarded as a multi - objective optimization problem (MOP) and Markov / stochastic games in a cooperative setting. - Multiple agents need to coordinate with each other. They should not only keep their own intersections unobstructed, but also pay attention to the traffic flow status of surrounding and even distant intersections, so as to ultimately improve the efficiency of the entire road network. 3. **Limitations of existing methods**: - Existing traffic signal control (TSC) methods can be divided into rule - based methods and learning - based methods. - Rule - based methods such as Webster, GreenWave and Max - pressure perform well under assumed conditions, but may perform poorly in practical applications. - Learning - based methods such as DRL perform excellently in single - intersection signal control, but still face challenges in multi - intersection problems. - Existing multi - agent reinforcement learning (MARL) algorithms mainly focus on the cooperation among agents, but there is a trade - off problem between centralized and distributed settings. ### Solutions 1. **Proposed new methods**: - This paper regards each intersection as a DRL agent and transforms the TSC problem into a Markov decision process (MDP). - It introduces a structural prior information about the road network as an inductive bias and extends Markov chain theory to the space - time domain for coordination. - By introducing the spatial discount rate 𝛾 to consider the changes in future states and rewards of distant intersections, and using it to correct the learning of the current intersection strategy. - It introduces an attention mechanism to correct the influence weights of surrounding intersections on the current intersection. 2. **Specific contributions**: - It proposes a coordination framework **𝛾 - Reward**, which realizes scalable communication with adjacent intersections and even more distant intersections by sharing future states and rewards, achieving globally optimal control of the TSC problem. - It proposes a spatial differentiation method to collect space - time information in a decentralized manner and recursively correct the current reward. - It uses an attention mechanism to distinguish different importance within the neighborhood and updates the attention score parameters by imitating the idea of the target network. - Experimental results show that the **𝛾 - Reward** series of methods maintain state - of - the - art performance in various road networks while achieving better scalability. ### Summary This paper proposes a new multi - agent coordination framework **𝛾 - Reward** by introducing spatial differentiation and attention mechanisms, solves the coordination problem of intelligent traffic signal control in large - scale road networks, and achieves globally optimal traffic efficiency.