Data-Driven LQR with Finite-Time Experiments via Extremum-Seeking Policy Iteration

Guido Carnevale,Nicola Mimmo,Giuseppe Notarstefano
2024-12-04
Abstract:In this paper, we address Linear Quadratic Regulator (LQR) problems through a novel iterative algorithm named EXtremum-seeking Policy iteration LQR (EXP-LQR). The peculiarity of EXP-LQR is that it only needs access to a truncated approximation of the infinite-horizon cost associated to a given policy. Hence, EXP-LQR does not need the direct knowledge of neither the system matrices, cost matrices, and state measurements. In particular, at each iteration, EXP-LQR refines the maintained policy using a truncated LQR cost retrieved by performing finite-time virtual or real experiments in which a perturbed version of the current policy is employed. Such a perturbation is done according to an extremum-seeking mechanism and makes the overall algorithm a time-varying nonlinear system. By using a Lyapunov-based approach exploiting averaging theory, we show that EXP-LQR exponentially converges to an arbitrarily small neighborhood of the optimal gain matrix. We corroborate the theoretical results with numerical simulations involving the control of an induction motor.
Optimization and Control,Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to solve the linear - quadratic regulator (LQR) problem through a new iterative algorithm, EXtremum - seeking Policy iteration LQR (EXP - LQR), without directly knowing the system matrix, cost matrix, and state measurement. Specifically, this method only needs to access a truncated approximation of the infinite - horizon cost of a given policy, thus avoiding the dependence on the internal parameters of the system. ### Problem Background In the classic LQR problem, the goal is to find an optimal state - feedback gain matrix \(K^*\) to minimize the performance index of the system. However, in practical applications, the system matrices \(A\) and \(B\), as well as the cost matrices \(Q\) and \(R\) may be unknown or difficult to obtain accurately. In addition, traditional model - based methods require a large amount of prior knowledge and computational resources. ### Main Contributions of the Paper 1. **Data - Driven LQR Method**: A new method named EXP - LQR is proposed, which can optimize the control strategy through finite - time experiments without directly knowing the system and cost matrices. 2. **Extremum - Searching Mechanism**: The extremum - searching mechanism is used to improve the current strategy. This method obtains the truncated LQR cost by perturbing the current strategy and conducting experiments within a finite time. 3. **Stability Analysis of Nonlinear Time - Varying Systems**: By using Lyapunov stability and averaging theory tools, it is proved that the EXP - LQR algorithm can exponentially converge to an arbitrarily small neighborhood of the optimal gain matrix. 4. **Verification in Practical Applications**: Through numerical simulation, the effectiveness of this method in controlling induction motors is verified. ### Specific Implementation Steps - **Initialization**: Select the initial gain matrix \(K_0\) and other parameters. - **Experiment Phase**: In each iteration step \(k\), use the perturbed strategy \(K_k+\delta D_k\) to conduct a finite - time experiment and obtain the truncated LQR cost \(J_T(K_k + \delta D_k)\). - **Optimization Phase**: Update the gain matrix \(K_k\) and the auxiliary variable \(z_k\) according to the obtained cost information. ### Mathematical Formulas - Truncated LQR cost: \[ J_T(K)=\frac{1}{2}\text{Tr}\left(\sum_{t = 0}^{T-1}(A + BK)^t(Q+K^T RK)(A + BK)^t\right) \] - Extremum - searching update rules: \[ z_{k + 1}=z_k+\gamma(J_T(K_k+\delta D_k)-z_k) \] \[ K_{k+1}=K_k-\gamma^2\frac{(J_T(K_k+\delta D_k)-z_k)}{\delta}D_k \] ### Conclusion The EXP - LQR method proposed in the paper provides a novel data - driven way to solve the LQR problem, especially suitable for cases where the system and cost matrices are unknown. Through strict theoretical analysis and numerical experiments, the effectiveness and robustness of this method are proved.