A Novel Framework Combining MPC and Deep Reinforcement Learning With Application to Freeway Traffic Control

Dingshan Sun,Anahita Jamshidnejad,Bart De Schutter
DOI: https://doi.org/10.1109/tits.2023.3342651
IF: 8.5
2024-01-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:Model predictive control (MPC) and deep reinforcement learning (DRL) have been developed extensively as two independent techniques for traffic management. Although the features of MPC and DRL complement each other very well, few of the current studies consider combining these two methods for application in the field of freeway traffic control. This paper proposes a novel framework for integrating MPC and DRL methods for freeway traffic control that is different from existing MPC-(D)RL methods. Specifically, the proposed framework adopts a hierarchical structure, where a high-level efficient MPC component works at a low frequency to provide a baseline control input, while the DRL component works at a high frequency to modify online the output generated by MPC. The control framework, therefore, needs only limited online computational resources and is able to handle uncertainties and external disturbances after proper learning with enough training data. The proposed framework is implemented on a benchmark freeway network in order to coordinate ramp metering and variable speed limits, and the performance is compared with standard MPC and DRL approaches. The simulation results show that the proposed framework outperforms standalone MPC and DRL methods in terms of total time spent (TTS) and constraint satisfaction, despite model uncertainties and external disturbances.
engineering, electrical & electronic,transportation science & technology, civil
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively combine the advantages of model predictive control (MPC) and deep reinforcement learning (DRL) in highway traffic control to overcome the limitations of each method. Specifically: - **Limitations of MPC**: Although MPC can explicitly handle input and state constraints to meet safety requirements, it depends on an accurate mathematical model. Moreover, when facing large - scale and complex systems (such as highway networks), it will lead to highly nonlinear and non - convex optimization problems, which are difficult to solve in real - time. In addition, model mismatch and external disturbances will reduce the closed - loop performance of MPC. - **Limitations of DRL**: DRL can naturally deal with uncertainties and handle problems with an infinite prediction range when online computing resources are limited. However, training an efficient DRL agent usually takes a lot of time, especially in complex systems. In addition, DRL cannot guarantee to meet safety constraints during the learning stage and implementation process, and problems such as low sample efficiency and reward delay are also its challenges. To solve the above problems, the paper proposes a new framework. This framework adopts a hierarchical structure, in which the high - level and efficient MPC component operates at a lower frequency to provide baseline control inputs; while the DRL component operates at a higher frequency to make online modifications to the output generated by MPC to compensate for model mismatches that may affect MPC. In this way, this framework can handle uncertainties and external disturbances when computing resources are limited and perform well after appropriate training. The paper verifies the effectiveness of this method by implementing this framework on a benchmark highway network and coordinating ramp metering (RM) and variable speed limits (VSL). Simulation results show that compared with using the MPC or DRL method alone, this framework performs better in terms of total time spent (TTS) and constraint satisfaction, even in the presence of model uncertainties and external disturbances.