Karting racing: A revisit to PPO and SAC algorithm

Chengyuan Xu,Ruijie Zhu,Dongce Yang
DOI: https://doi.org/10.1109/cisai54367.2021.00066
2021-09-01
Abstract:Proximal Policy Optimization (PPO) is a classical algorithm in reinforcement learning, which has been tested in a collection of benchmark tasks. In this paper, we test PPO in Unity environment to train Karting racing agent, with different parameters and different training settings. In our experiments, the improved PPO algorithm has a better performance in convergence rate and practical results (the average speed of agent) than baseline and other algorithms, such as Soft Actor-Critic (SAC). Visually, the agent trained by PPO appears more stable, which is able to cope with challenging tracks and has better generalization ability when we turned our trained model into a new environment. Our improved algorithm enables several applications, such as auto-pilot and UAV navigation.
What problem does this paper attempt to address?