Abstract:In this paper, we address Linear Quadratic Regulator (LQR) problems through a novel iterative algorithm named EXtremum-seeking Policy iteration LQR (EXP-LQR). The peculiarity of EXP-LQR is that it only needs access to a truncated approximation of the infinite-horizon cost associated to a given policy. Hence, EXP-LQR does not need the direct knowledge of neither the system matrices, cost matrices, and state measurements. In particular, at each iteration, EXP-LQR refines the maintained policy using a truncated LQR cost retrieved by performing finite-time virtual or real experiments in which a perturbed version of the current policy is employed. Such a perturbation is done according to an extremum-seeking mechanism and makes the overall algorithm a time-varying nonlinear system. By using a Lyapunov-based approach exploiting averaging theory, we show that EXP-LQR exponentially converges to an arbitrarily small neighborhood of the optimal gain matrix. We corroborate the theoretical results with numerical simulations involving the control of an induction motor.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to solve the linear - quadratic regulator (LQR) problem through a new iterative algorithm, EXtremum - seeking Policy iteration LQR (EXP - LQR), without directly knowing the system matrix, cost matrix, and state measurement. Specifically, this method only needs to access a truncated approximation of the infinite - horizon cost of a given policy, thus avoiding the dependence on the internal parameters of the system. ### Problem Background In the classic LQR problem, the goal is to find an optimal state - feedback gain matrix \(K^*\) to minimize the performance index of the system. However, in practical applications, the system matrices \(A\) and \(B\), as well as the cost matrices \(Q\) and \(R\) may be unknown or difficult to obtain accurately. In addition, traditional model - based methods require a large amount of prior knowledge and computational resources. ### Main Contributions of the Paper 1. **Data - Driven LQR Method**: A new method named EXP - LQR is proposed, which can optimize the control strategy through finite - time experiments without directly knowing the system and cost matrices. 2. **Extremum - Searching Mechanism**: The extremum - searching mechanism is used to improve the current strategy. This method obtains the truncated LQR cost by perturbing the current strategy and conducting experiments within a finite time. 3. **Stability Analysis of Nonlinear Time - Varying Systems**: By using Lyapunov stability and averaging theory tools, it is proved that the EXP - LQR algorithm can exponentially converge to an arbitrarily small neighborhood of the optimal gain matrix. 4. **Verification in Practical Applications**: Through numerical simulation, the effectiveness of this method in controlling induction motors is verified. ### Specific Implementation Steps - **Initialization**: Select the initial gain matrix \(K_0\) and other parameters. - **Experiment Phase**: In each iteration step \(k\), use the perturbed strategy \(K_k+\delta D_k\) to conduct a finite - time experiment and obtain the truncated LQR cost \(J_T(K_k + \delta D_k)\). - **Optimization Phase**: Update the gain matrix \(K_k\) and the auxiliary variable \(z_k\) according to the obtained cost information. ### Mathematical Formulas - Truncated LQR cost: \[ J_T(K)=\frac{1}{2}\text{Tr}\left(\sum_{t = 0}^{T-1}(A + BK)^t(Q+K^T RK)(A + BK)^t\right) \] - Extremum - searching update rules: \[ z_{k + 1}=z_k+\gamma(J_T(K_k+\delta D_k)-z_k) \] \[ K_{k+1}=K_k-\gamma^2\frac{(J_T(K_k+\delta D_k)-z_k)}{\delta}D_k \] ### Conclusion The EXP - LQR method proposed in the paper provides a novel data - driven way to solve the LQR problem, especially suitable for cases where the system and cost matrices are unknown. Through strict theoretical analysis and numerical experiments, the effectiveness and robustness of this method are proved.

Data-Driven LQR with Finite-Time Experiments via Extremum-Seeking Policy Iteration

Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient

Data-Driven LQR using Reinforcement Learning and Quadratic Neural Networks

Data-Enabled Policy Optimization for Direct Adaptive Learning of the LQR

Q-Learning Methods for LQR Control of Completely Unknown Discrete-Time Linear Systems

A Moreau Envelope Approach for LQR Meta-Policy Estimation

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient

Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost

Fast Policy Learning for Linear Quadratic Control with Entropy Regularization

i2LQR: Iterative LQR for Iterative Tasks in Dynamic Environments

Designing Experiments for Data-Driven Control of Nonlinear Systems

Structured Policy Iteration for Linear Quadratic Regulator

Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics

Accelerated Optimization Landscape of Linear-Quadratic Regulator

On the Certainty-Equivalence Approach to Direct Data-Driven LQR Design

Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective

Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System

Direct Data-Driven Discounted Infinite Horizon Linear Quadratic Regulator with Robustness Guarantees

Data-enabled Policy Optimization for the Linear Quadratic Regulator