Kernel Least-Squares Temporal Difference Learning

Xin Xu,Tao Xie,Dewen Hu,Xicheng Lu
2006-01-01
Abstract:Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, non-linear and non-parametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, we present a novel kernel-based least-squares temporal-difference (TD) learning algorithm called KLS-TD(λ), which can be viewed as the kernel version or nonlinear form of the previous linear LS-TD(λ) algorithms. By introducing kernel-based nonlinear mapping, the KLS-TD(λ) algorithm is superior to conventional linear TD(λ) algorithms in value function prediction or policy evaluation problems with nonlinear value functions. Furthermore, in KLS-TD(λ), the eligibility traces in kernel-based TD learning are derived to make use of data more efficiently, which is different from the recent work on Gaussian Processes in reinforcement learning. Experimental results on a typical value-function learning prediction problem of a Markov chain demonstrate the effectiveness of the proposed method. Keyword: Reinforcement learning, Kernel methods, Temporal difference, Markov chain.
What problem does this paper attempt to address?