Abstract:In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

Manifold Regularization for Kernelized LSTD

Kernel-Based Least Squares Policy Iteration for Reinforcement Learning.

Manifold Regularization Based Approximate Value Iteration For Learning Control

Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning

Intelligent Control Of A Sensor-Actuator System Via Kernelized Least-Squares Policy Iteration

Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning

IKLTSA: An Incremental Kernel LTSA Method

Approximate policy iteration with unsupervised feature learning based on manifold regularization

A novel feature sparsification method for kernel-based approximate policy iteration

Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

Improving Classification Precision by Implicit Kernels Motivated by Manifold Learning

Efficient Policy Evaluation by Matrix Sketching

Manifold regularization Multiple Kernel Learning machine for classification

Kernel Least-Squares Temporal Difference Learning

Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.

Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

Manifold Regularized Reinforcement Learning.

Policy Optimization over General State and Action Spaces

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Control Regularization for Reduced Variance Reinforcement Learning

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice