Enhancing System-Level Safety in Mixed-Autonomy Platoon via Safe Reinforcement Learning

Jingyuan Zhou,Longhao Yan,Kaidi Yang
2024-03-01
Abstract:Connected and automated vehicles (CAVs) have recently gained prominence in traffic research due to advances in communication technology and autonomous driving. Various longitudinal control strategies for CAVs have been developed to enhance traffic efficiency, stability, and safety in mixed-autonomy scenarios. Deep reinforcement learning (DRL) is one promising strategy for mixed-autonomy platoon control, thanks to its capability of managing complex scenarios in real time after sufficient offline training. However, there are three research gaps for DRL-based mixed-autonomy platoon control: (i) the lack of theoretical collision-free guarantees, (ii) the widely adopted but impractical assumption of skilled and rational drivers who will not collide with preceding vehicles, and (iii) the strong assumption of a known human driver model. To address these research gaps, we propose a safe DRL-based controller that can provide a system-level safety guarantee for mixed-autonomy platoon control. First, we combine control barrier function (CBF)-based safety constraints and DRL via a quadratic programming (QP)-based differentiable neural network layer to provide theoretical safety guarantees. Second, we incorporate system-level safety constraints into our proposed method to account for the safety of both CAVs and the following HDVs to address the potential collisions due to irrational human driving behavior. Third, we devise a learning-based system identification approach to estimate the unknown human car-following behavior in the real system. Simulation results demonstrate that our proposed method effectively ensures CAV safety and improves HDV safety in mixed platoon environments while simultaneously enhancing traffic capacity and string stability.
Systems and Control
What problem does this paper attempt to address?
This paper aims to solve the problem of system - level safety in mixed - autonomous vehicle fleets. Specifically, the paper focuses on three research gaps in deep - reinforcement - learning (DRL) - based mixed - autonomous vehicle - fleet control: 1. **Lack of theoretical collision - free guarantee**: Although existing DRL methods perform well in handling complex scenarios, they usually can only consider safety indirectly through the reward function, which is difficult to provide formal safety assurance. 2. **Assume rational and highly - skilled driver behavior**: Existing research often assumes that human drivers will not collide with the vehicle in front, thus only considering the safety of their own vehicles. However, this assumption does not hold in reality because human errors are common and human - driven vehicles at the rear may cause dangerous situations for the entire fleet. 3. **Assume known human - driver models**: Existing literature usually assumes that the behavior models of human drivers are known, but in actual traffic systems, driver behavior may be variable and unknown. To address these challenges, the paper proposes a controller based on safe reinforcement learning (Safe DRL), which can provide system - level safety assurance for mixed - autonomous vehicle fleets. The main contributions include: 1. **Combining control barrier functions (CBF) and DRL**: By introducing a differentiable neural - network layer based on quadratic programming (QP), the safety constraints of CBF are combined with DRL to provide theoretical safety guarantees. 2. **Considering system - level safety**: Not only the safety of one's own vehicle is considered, but also the safety of surrounding human - driven vehicles (HDVs) is considered to improve the safety performance of the entire fleet. 3. **Learning unknown human car - following behavior**: Through a learning - based system - identification method, the unknown human car - following behavior in the actual system is estimated, thereby constructing the dynamic model required by CBF. The paper verifies the effectiveness of the proposed method through simulation results, indicating that this method can not only ensure the safety of CAVs, but also improve the safety performance of HDVs, while enhancing traffic capacity and string stability.