Actor-critic algorithm based on Gaussian process

Shichao Chen,Xinghong Ling,Quan Liu,Yuchen Fu,Guixing Chen
DOI: https://doi.org/10.3969/j.issn.1001-3695.2016.06.016
2016-01-01
Abstract:The problem of how to balance the exploration and exploitation in the large or continuous state space is a hot topic in the field of reinforcement learning.With respect to this problem,this paper presented a novel actor-critic algorithm which combined with function approximation method and Gaussian process method.In the terms of actor,the algorithm used the tem-poral difference error to construct a mean square error function with respect to the policy parameters.In the terms of critic,the algorithm used Gaussian process to model the linear state-value function,and in conjunction with generative model,obtained the posteriori distribution of the parameter vector of the state-value function by Bayesian inference.The experimental results on the balance pole experiment shows that the algorithm has faster convergence rate and achieves the balance between exploration and exploitation in the large or continuous state space effectively.The algorithm has good convergence performance.
What problem does this paper attempt to address?