Abstract:The intelligent control of the traffic signal is critical to the optimization of transportation systems. To achieve global optimal traffic efficiency in large-scale road networks, recent works have focused on coordination among intersections, which have shown promising results. However, existing studies paid more attention to observations sharing among intersections (both explicit and implicit) and did not care about the consequences after decisions. In this paper, we design a multi-agent coordination framework based on Deep Reinforcement Learning method for traffic signal control, defined as <math>γ</math>-Reward that includes both original <math>γ</math>-Reward and <math>γ</math>-Attention-Reward. Specifically, we propose the Spatial Differentiation method for coordination which uses the temporal–spatial information in the replay buffer to amend the reward of each action. A concise theoretical analysis that proves the proposed model can converge to Nash equilibrium is given. By extending the idea of Markov Chain to the dimension of space–time, this truly decentralized coordination mechanism replaces the graph attention method and realizes the decoupling of the road network, which is more scalable and more in line with practice. The simulation results show that the proposed model remains a state-of-the-art performance even not use a centralized setting. Code is available in <a href="https://github.com/Skylark0924/Gamma_Reward">https://github.com/Skylark0924/Gamma_Reward</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of intelligent traffic signal control in large - scale road networks, especially how to achieve coordinated control among multiple intersections to reach the globally optimal traffic efficiency. Existing research mainly focuses on the sharing of observational information (including explicit and implicit) between intersections, but pays less attention to the consequences after decision - making. This paper proposes a multi - agent coordination framework based on deep reinforcement learning (DRL), defined as **𝛾 - Reward**, including the original **𝛾 - Reward** and **𝛾 - Attention - Reward**. ### Specific problem descriptions 1. **Traffic congestion problems**: - Traffic congestion not only increases commuting time, but also aggravates noise and environmental pollution problems. - Most collisions and delays in urban traffic are concentrated at intersections, and unreasonable signal control will lead to a waste of traffic resources. - Therefore, the key to solving urban congestion lies in keeping intersections unobstructed. 2. **Multi - intersection coordinated control problems**: - The intelligent regulation of large - scale road networks requires the realization of coordinated control among intersections, which can be regarded as a multi - objective optimization problem (MOP) and Markov / stochastic games in a cooperative setting. - Multiple agents need to coordinate with each other. They should not only keep their own intersections unobstructed, but also pay attention to the traffic flow status of surrounding and even distant intersections, so as to ultimately improve the efficiency of the entire road network. 3. **Limitations of existing methods**: - Existing traffic signal control (TSC) methods can be divided into rule - based methods and learning - based methods. - Rule - based methods such as Webster, GreenWave and Max - pressure perform well under assumed conditions, but may perform poorly in practical applications. - Learning - based methods such as DRL perform excellently in single - intersection signal control, but still face challenges in multi - intersection problems. - Existing multi - agent reinforcement learning (MARL) algorithms mainly focus on the cooperation among agents, but there is a trade - off problem between centralized and distributed settings. ### Solutions 1. **Proposed new methods**: - This paper regards each intersection as a DRL agent and transforms the TSC problem into a Markov decision process (MDP). - It introduces a structural prior information about the road network as an inductive bias and extends Markov chain theory to the space - time domain for coordination. - By introducing the spatial discount rate 𝛾 to consider the changes in future states and rewards of distant intersections, and using it to correct the learning of the current intersection strategy. - It introduces an attention mechanism to correct the influence weights of surrounding intersections on the current intersection. 2. **Specific contributions**: - It proposes a coordination framework **𝛾 - Reward**, which realizes scalable communication with adjacent intersections and even more distant intersections by sharing future states and rewards, achieving globally optimal control of the TSC problem. - It proposes a spatial differentiation method to collect space - time information in a decentralized manner and recursively correct the current reward. - It uses an attention mechanism to distinguish different importance within the neighborhood and updates the attention score parameters by imitating the idea of the target network. - Experimental results show that the **𝛾 - Reward** series of methods maintain state - of - the - art performance in various road networks while achieving better scalability. ### Summary This paper proposes a new multi - agent coordination framework **𝛾 - Reward** by introducing spatial differentiation and attention mechanisms, solves the coordination problem of intelligent traffic signal control in large - scale road networks, and achieves globally optimal traffic efficiency.

Learning scalable multi-agent coordination by spatial differentiation for traffic signal control

A Method for Signal Coordination in Large-Scale Urban Road Networks

TraCo: Learning Virtual Traffic Coordinator for Cooperation with Multi-Agent Reinforcement Learning.

Network Clustering-Based Multi-Agent Reinforcement Learning for Large-Scale Traffic Signal Control

A multi‐agent deep reinforcement learning approach for traffic signal coordination

Multi-agent Deep Reinforcement Learning collaborative Traffic Signal Control method considering intersection heterogeneity

Graph cooperation deep reinforcement learning for ecological urban traffic signal control

Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning

A distributed algorithm for signal coordination of multiple agents with embedded platoon dispersion model

Distributed Signal Control of Arterial Corridors Using Multi-Agent Deep Reinforcement Learning

Towards Multi-agent Reinforcement Learning based Traffic Signal Control through Spatio-temporal Hypergraphs

Combat Urban Congestion via Collaboration: Heterogeneous GNN-based MARL for Coordinated Platooning and Traffic Signal Control

Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning

Cooperative Reinforcement Learning on Traffic Signal Control

Real-Time Network-Level Traffic Signal Control: An Explicit Multiagent Coordination Method

The coordination between traffic signal control agents based on Q-learning

Regional Multi-Agent Cooperative Reinforcement Learning for City-Level Traffic Grid Signal Control

Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control

A multi-agent reinforcement learning method with curriculum transfer for large-scale dynamic traffic signal control

AGRCNet: communicate by attentional graph relations in multi-agent reinforcement learning for traffic signal control