Trustworthy safety improvement for autonomous driving using reinforcement learning

Zhong Cao,Shaobing Xu,Xinyu Jiao,Huei Peng,Diange Yang
DOI: https://doi.org/10.1016/j.trc.2022.103656
2022-05-01
Abstract:Reinforcement learning (RL) can learn from past failures and has the potential to provide self-improvement ability and higher-level intelligence. However, the current RL algorithms still suffer from challenges in reliability, especially compared to the rule/model-based algorithms that are pre-engineered, human-input intensive, but widely used in autonomous vehicles. To take advantages of both the RL and rule-based algorithms, this work aims to design a decision-making framework that leverages RL and use an existing rule-based policy as its performance lower bound. In this way, the final policy remains the potential of self-learning, while guaranteeing a better system performance compared with the integrated rule-based policy. Such a decision-making framework is called trustworthy improvement RL (TiRL). The basic idea is to make the RL policy iteration process synchronously estimate the given rule-based policy's value function. AV will then use the RL policy to drive only in the cases where the RL has learned a better policy, i.e., a higher policy value. This work takes highway safe driving as the case study. The results are obtained through more than 42,000 km driving in stochastic simulated traffic, and calibrated by naturalistic driving data. The TiRL planner is given two typical rule-based highway-driving policies for comparison. The results show that the TiRL can outperform the given arbitrary rule-based driving policy. In summary, the proposed TiRL can leverage the learning-based method in stochastic and emergent scenarios, while having a trustworthy safety improvement from the existing rule-based policies.
transportation science & technology
What problem does this paper attempt to address?