Spatio-Temporal Sequence Modeling for Traffic Signal Control
Qian Sun,Le Zhang,Jingbo Zhou,Rui Zha,Yu Mei,Chujie Tian,Hui Xiong
DOI: https://doi.org/10.1145/3627673.3679998
2024-01-01
Abstract:Traffic Signal Control(TSC), a pivotal and challenging research area in the transportation domain, aims to alleviate congestion at urban intersections by optimizing vehicular flows from different inflow directions. While large efforts have been focused on using Reinforcement Learning(RL) based methods to tackle the TSC problem, it possesses constraints such as unpredictable training duration and risks of online exploration, limiting its real-world deployment. Recently, offline RL has emerged as a new solution by transitioning from learning through online interactions to deriving policies from pre-collected datasets, which guarantees a safer and more efficient learning process. However, existing offline methods overlook the crucial temporal and spatial intricacy among data from different traffic signals at different timesteps, which leads to suboptimal performance. To this end, in this paper, we present an innovative formulation of the offline TSC problem by introducing a spatio-temporal graph to model the historical Markov Decision Process sequences across all traffic signals within the road network. Along this line, we propose STLight, a novel spatio-temporal sequence modeling approach to predict optimal actions for the signals from historical data, accounting for the inherent inter-dependencies among them. Specifically, we incorporate a spatio-temporal encoder to represent states, actions, and returns by capturing dynamic and spatially dependent information. The ordered space-time-aware representations are further fed to the Action Decoder to predict signal phase actions in an auto-regressive manner, accounting for the hidden dependencies between the actions and the reward and state tokens. Furthermore, to adaptively handle tasks with different levels of congestion scenarios, we incorporate space-aware return-based contrastive learning to automatically differentiate data samples with disparate traffic flow patterns. Finally, extensive experiments conducted on two public real-world traffic datasets clearly demonstrate the superior performance of the proposed model over both the state-of-the-art online and offline traffic signal control baselines.