Autonomous Racing using a Hybrid Imitation-Reinforcement Learning Architecture

Chinmay Vilas Samak,Tanmay Vilas Samak,Sivanathan Kandhasamy
DOI: https://doi.org/10.48550/arXiv.2110.05437
2022-11-27
Abstract:In this work, we present a rigorous end-to-end control strategy for autonomous vehicles aimed at minimizing lap times in a time attack racing event. We also introduce AutoRACE Simulator developed as a part of this research project, which was employed to simulate accurate vehicular and environmental dynamics along with realistic audio-visual effects. We adopted a hybrid imitation-reinforcement learning architecture and crafted a novel reward function to train a deep neural network policy to drive (using imitation learning) and race (using reinforcement learning) a car autonomously in less than 20 hours. Deployment results were reported as a direct comparison of 10 autonomous laps against 100 manual laps by 10 different human players. The autonomous agent not only exhibited superior performance by gaining 0.96 seconds over the best manual lap, but it also dominated the human players by 1.46 seconds with regard to the mean lap time. This dominance could be justified in terms of better trajectory optimization and lower reaction time of the autonomous agent.
Robotics,Artificial Intelligence,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to minimize the lap time of autonomous vehicles in time - attack racing events by developing a hybrid imitation - reinforcement learning architecture. Specifically, the research team hopes to achieve this goal in the following ways: 1. **Develop a high - fidelity simulator**: In order to accurately simulate the dynamics of vehicles and the environment and provide realistic audio - visual effects, the researchers developed the AutoRACE Simulator. This simulator not only provides a highly realistic driving experience but also supports flexible experimental settings. 2. **Adopt a hybrid learning strategy**: Combine imitation learning (Imitation Learning) and reinforcement learning (Reinforcement Learning) to train deep neural network policies. Imitation learning ensures the rapid acquisition of basic driving capabilities, while reinforcement learning further optimizes the racing performance. 3. **Design a novel reward function**: Through a carefully designed reward mechanism, including behavior cloning, GAIL rewards, curiosity rewards, and external rewards, to guide autonomous agents to continuously improve their driving and racing skills during the training process. 4. **Verify the performance of autonomous agents**: Evaluate the performance of autonomous agents in virtual racing by direct comparison with human players. The results show that the autonomous agents are not only superior to human players in terms of average lap time but also have a significant advantage in the best lap time. ### Specific problem description - **Minimize lap time**: Autonomous vehicles need to complete a lap on a closed track as quickly as possible, which requires precise trajectory planning and efficient control strategies. - **Improve robustness and reliability**: Ensure that autonomous vehicles can operate stably under various conditions and reduce unnecessary exploration and incorrect operations. - **Surpass human performance**: Through optimization algorithms and strategies, make autonomous vehicles surpass human drivers in some aspects (such as reaction time and trajectory optimization). ### Solutions - **Hybrid learning architecture**: Combine imitation learning and reinforcement learning, which not only takes advantage of expert data but also continuously optimizes performance through self - exploration. - **High - fidelity simulator**: The AutoRACE Simulator provides a realistic environment and vehicle dynamics simulation, which is helpful for the transition from virtual to real. - **Novel reward mechanism**: Guide autonomous agents to learn optimal behaviors through multiple reward signals to ensure that the training process is efficient and safe. ### Experimental results Through comparative tests with 10 different human players, the autonomous agents performed excellently in 10 autonomous tests. The average lap time was 1.46 seconds faster than that of human players, and the best lap time was 0.96 seconds faster than that of the best human player. These results demonstrate the effectiveness of the hybrid learning strategy and the high - fidelity simulator, as well as the advantages of autonomous agents in trajectory optimization and reaction time. ### Conclusion This research shows how to achieve super - human - level autonomous racing in a virtual environment through a hybrid imitation - reinforcement learning architecture and a high - fidelity simulator. This lays a solid foundation for autonomous racing technology in future practical applications and provides new ideas for developing more intelligent autonomous driving systems.