Autonomous driving policy learning from demonstration using regression loss function

Yukun Xiao,Yisheng An,Ting Li,Naiqi Wu,Wei He,Peng Li
DOI: https://doi.org/10.1016/j.knosys.2024.111766
IF: 8.139
2024-04-08
Knowledge-Based Systems
Abstract:How to efficiently train a high-performance autonomous driving agent remains a realistic and challenging issue. Although in the literature, many techniques, especially deep reinforcement learning (DRL) methods, have been developed, they are mainly computationally inefficient and time-consuming. In recent years, pre-training approaches by utilizing the demonstration data are adopted with DRL to improve the training efficiency and performance of the related methods. However, due to the complexity of these pre-training DRL methods and the strict requirements on the format of the demonstration data, it is still difficult for these methods to be efficiently applicable to autonomous driving scenarios. To alleviate these problems, we propose a new algorithm called Pre-training Deep Reinforcement Learning with Improved Advantage and Loss Function (PDRL). Specifically, for scenarios with high temporal sparsity in continuous action spaces, the pre-training requires only one network to be trained with a designed demonstration regression loss function based on an improved normalized advantage function. In this way, the agent can achieve a higher value when selecting actions similar to the demonstration data and simplify the network's output and computation by using exploration methods that are more suitable for inertial systems. Furthermore, a new prioritization formula is presented to improve the algorithm's performance and convergence speed. Finally, the MetaDrive simulation platform is used to test the performance of the proposed approach and comparisons are made with the existing algorithms, including Soft Actor-Critic (SAC), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Advantage Weighting with Early Termination Actor-Critic (AWET(SAC)).
computer science, artificial intelligence
What problem does this paper attempt to address?