A Gradient Algorithm for Neural-Network-Based Reinforcement Learning

徐昕,贺汉根
DOI: https://doi.org/10.3321/j.issn:0254-4164.2003.02.014
2003-01-01
Chinese Journal of Computers
Abstract:To solve Markov decision problems with continuous state space and discrete action space, neural networks are commonly used as value function approximators. Since there are no teacher signals in reinforcement learning, gradient algorithms for neural networks in supervised learning can not be applied directly. The existing direct algorithms for reinforcement-learning based on neural networks are not gradient descent algorithms of any objective functions. Thus, their convergence analysis is hard to be obtained and some divergence examples have been found. In the previous work on residual gradient algorithms, the action policy is assumed to be stationary so that convergence can not be guaranteed when the action policy is usually greedy with respect to the estimated value function. In this paper, a new gradient descent reinforcement-learning algorithm is proposed, where multi-layer feed-forward neural networks are used as value function approximators. A nearly greedy and differentiable action policy with Boltzmann probability distribution is employed in the new algorithm. The optimal value functions of Markov decision processes are approximated by minimizing Bellman residuals with non-stationary action polices. To derive incremental gradient learning rules, an upper bound function of the Bellman residuals is employed as the objective function. The convergence of the proposed algorithm and the performance of the approximated optimal policy are analyzed theoretically. Simulation results on the learning control of the Mountain-Car problem illustrate the learning efficiency and generalization ability of the proposed algorithm.
What problem does this paper attempt to address?