Adaptive Kernel-Width Selection For Kernel-Based Least-Squares Policy Iteration Algorithm

Jun Wu,Xin Xu,Lei Zuo,Zhaobin Li,Jian Wang
DOI: https://doi.org/10.1007/978-3-642-21090-7_70
2011-01-01
Abstract:The Kernel-based Least-squares Policy Iteration (KLSPI) algorithm provides a general reinforcement learning solution for large-scale Markov decision problems. In KLSPI, the Radial Basis Function (RBF) kernel is usually used to approximate the optimal value-function with high precision. However, selecting a proper kernel-width for the RBF kernel function is very important for KLSPI to be adopted successfully. In previous research, the kernel-width was usually set manually or calculated according to the sample distribution in advance, which requires prior knowledge or model information. In this paper, an adaptive kernel-width selection method is proposed for the KLSPI algorithm. Firstly, a sparsification procedure with neighborhood analysis based on the l(2)-ball of radius e is adopted, which helps obtain a reduced kernel dictionary without presetting the kernel-width. Secondly, a gradient descent method based on the Bellman Residual Error (BRE) is proposed so as to find out a kernel-width minimizing the sum of the BRE. The experimental results show the proposed method can help KLSPI approximate the true value-function more accurately, and, finally, obtain a better control policy.
What problem does this paper attempt to address?