Actor-Critic Algorithms With Epsilon-Greedy Gaussian Policy In Multidimensional Continuous Action Spaces

Chunyuan Zhang,Qingxin Zhu,Yigui Ou,Xinzheng Niu
2016-01-01
Abstract:In actor-critic (AC) algorithms, the Gaussian policy is widely used for solving the sequential decision problems with continuous action spaces. However, this policy has a tendency of over-exploration due to lack of greediness, which often makes AC algorithms difficult to obtain good convergence speed and quality. In this paper, we propose a novel is an element of-greedy Gaussian policy, and present two compatible AC algorithm frameworks for successfully using it. The proposed policy can be viewed as a hybrid of the traditional Gaussian policy and the is an element of-greedy policy. At each time step, it generates some candidate actions by performing symmetric Gaussian perturbations on the current action mean, and then uses the is an element of-greedy policy to select the behaviour action based on advantage functions. To the best of our knowledge, this is the first time to introduce the is an element of-greedy policy into AC algorithms for solving multidimensional continuous action problems. Theoretical analysis shows that compatible AC algorithms can obtain better convergence quality with the proposed policy than with the traditional Gaussian policy. Finally, experimental results on a mountain-car problem and a puddle-world problem demonstrate the effectiveness of the proposed policy and compatible AC algorithm frameworks.
What problem does this paper attempt to address?