Average Reward Multi-Step Temporal-Difference Learning Using Recursive Least-Squares Methods

李春贵,刘永信,王萌
DOI: https://doi.org/10.3969/j.issn.1000-1638.2008.05.016
2008-01-01
Abstract:Average reward temporal-difference learning of an irreducible aperiodic Markov chain based on linear function approximations is investigated.Approximations are comprised of linear combinations of fixed basis function whose weight are incrementally updated.On the basis of analyzing and investigating the exist algorithms,and using the linear parameter estimation theory,a new class of average reward multi-step temporal-difference learning algorithms based on linear function approximations and recursive least squares methods is proposed.A proof of uniform converge is presented.
What problem does this paper attempt to address?