Efficient and Balanced Exploration-driven Decision Making for Autonomous Racing Using Local Information
Zhen Tian,Dezong Zhao,Zhihao Lin,Wenjing Zhao,David Flynn,Yuande Jiang,Daxin Tian,Yuanjian Zhang,Yao Sun
DOI: https://doi.org/10.1109/tiv.2024.3432713
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Autonomous racing has attracted extensive interest due to its great potential in self-driving at the extreme limits. Model-based and learning-based methods are widely used in autonomous racing. Out of which, model-based methods cannot cope with complex environments when only local perception is available. This limit can be overcome by the Proximal policy optimization (PPO), a typical learning-based method, which does not excessively rely on global perception. However, existing PPO faces challenges with low training efficiency in long sequences. To solve this issue, this paper develops an improved PPO by introducing a curiosity mechanism, a balanced reward function, and an image efficient actor-critic network. The curiosity mechanism focuses on training on key segments, facilitating efficient short-term learning of the PPO. The balanced reward function adjusts rewards based on the complexity of racetracks, promoting efficient exploration of the control strategy during training. The image-efficient actorcritic network enhances the PPO to fast process the perceived information. Simulation results on a physical engine demonstrate that the proposed algorithm outperforms benchmark algorithms in achieving less number of collisions, higher peak reward with less training time, and shorter laptime among multiple testing racetracks