Abstract:In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

Reinforcement Learning with Partial Parametric Model Knowledge

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Guiding Reinforcement Learning with Incomplete System Dynamics

Accelerating deep reinforcement learning via knowledge-guided policy network

Reinforcement Learning Policy with Proportional-Integral Control.

Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control

Model-free Control Design Using Policy Gradient Reinforcement Learning in LPV Framework

Parametric PDE Control with Deep Reinforcement Learning and Differentiable L0-Sparse Polynomial Policies

Policy Gradient Reinforcement Learning for Parameterized Continuous-Time Optimal Control

Model-Based Reinforcement Learning In Continuous Environments Using Real-Time Constrained Optimization

Kernel-Based Least Squares Policy Iteration for Reinforcement Learning.

Policy Iteration Reinforcement Learning Method for Continuous-Time Linear-Quadratic Mean-Field Control Problems

Reinforcement Learning-Based Model Predictive Control for Discrete-Time Systems.

Reinforcement Learning with Partially Known World Dynamics

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Learning Over Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems (Extended Version)

Integrating Classical Control into Reinforcement Learning Policy

Reinforced Model Predictive Control via Trust-Region Quasi-Newton Policy Optimization

Suboptimal Reduced Control of Unknown Nonlinear Singularly Perturbed Systems Via Reinforcement Learning

Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application