Abstract:To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear-quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of both RL and primal-dual optimization. From the RL perspective, we first develop a new model-free off-policy policy iteration (MF-OPPI) algorithm, in which the sampled data is repeatedly used for updating the policy to alleviate the data-hungry problem to some extent. We then provide a rigorous analysis for algorithm convergence by showing that the involved iterations are equivalent to the iterations in the classical policy iteration (PI) algorithm. From the perspective of optimization, we first reformulate the stochastic LQR problem at hand as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal-dual (MB-PD) algorithm based on the properties of the resulting Karush-Kuhn-Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we show that the dual and primal update steps in the MB-PD algorithm can be interpreted as the policy evaluation and policy improvement steps in the PI algorithm, respectively. Finally, we provide one simulation example to show the performance of the proposed algorithms.

A Reinforcement Learning Method for LQR Control Problem

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Policy Iteration Reinforcement Learning Method for Continuous-Time Linear-Quadratic Mean-Field Control Problems

Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application

Reinforcement Learning-Based Direct Adaptive Optimal Control of JLQ Model

Deep Reinforcement Learning with Embedded LQR Controllers

Value iteration for LQR control of unknown stochastic-parameter linear systems

Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

A Tour of Reinforcement Learning: The View from Continuous Control

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Open-Loop Motion Control of a Hydraulic Soft Robotic Arm Using Deep Reinforcement Learning

Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Robust policy iteration for continuous-time stochastic $H_\infty$ control problem with unknown dynamics

Optimal Tracking Control of Nonlinear Multiagent Systems Using Internal Reinforce Q-Learning

LQR with Tracking: A Zeroth-order Approach and Its Global Convergence

Optimal Control of Two-Dimensional Roesser Model: Solution Based on Reinforcement Learning

Observation-based Optimal Control Law Learning with LQR Reconstruction

Model-free design of stochastic LQR controller from a primal–dual optimization perspective