Dyna-PPO Reinforcement Learning with Gaussian Process for the Continuous Action Decision-Making in Autonomous Driving.

Wu Guanlin,Fang Wenqi,Wang Ji,Ge Pin,Cao Jiang,Ping Yang,Gou Peng
DOI: https://doi.org/10.1007/s10489-022-04354-x
IF: 5.3
2023-01-01
Applied Intelligence
Abstract:Recent years have witnessed rapid development of autonomous driving. Model-based and model-free reinforcement learning are two popular learning methods for autonomous driving. However, these two kinds of methods have their own advantages in achieving excellent driving experience. In order to improve their efficiency and performance, Dyna framework is an promising way to combine their advantages. Unfortunately, the classical Dyna framework can not deal with the continuous actions in reinforcement learning. In addition, the interaction between the world model and the model-free reinforcement learning agent remains at the unidirectional data level. To further improve the effectiveness and efficiency of driving policy learning, we propose a novel Gaussian Process based Dyna-PPO approach in this paper. The Gaussian Process model, which is analytically tractable and fits for small-sample problems, is introduced to build the world model. In addition, we design a mechanism to realize bidirectional interaction between the world model and the policy model. Extensive experiments validate the effectiveness and robustness of our proposed approach. According to our simulation result, the driving distance of the vehicle could be improved by approximately 0.2× times.
What problem does this paper attempt to address?