Abstract:In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to design a continuous - time Adaptive Actor - Critic (AAC) reinforcement learning controller in uncertain nonlinear systems. Specifically, such systems have the drift - free property and there are parametric uncertainties in the control effectiveness matrix. Existing continuous - time reinforcement learning methods are difficult to be directly applied to the control strategy design of such systems because these methods usually assume that the system dynamics are known or only partially unknown. The method proposed in this paper overcomes this challenge by combining the Concurrent Learning (CL) technique to estimate the unknown parameters in the control effectiveness matrix. This method can not only achieve approximately optimal control effort, but also incorporates the stability guarantee of the closed - loop system into the design. ### Main contributions 1. **Propose a new Adaptive Actor - Critic algorithm**: For a class of drift - free nonlinear systems with uncertainties in the control effectiveness matrix, a new continuous - time Adaptive Actor - Critic reinforcement learning algorithm is designed. 2. **Use the Concurrent Learning technique**: Through the Concurrent Learning technique, an adaptive parameter update law is proposed to estimate the unknown parameters in the control effectiveness matrix. 3. **Guarantee the stability of the system**: Using Lyapunov stability analysis, it is proved that the signals of the closed - loop system converge within the Uniformly Ultimately Bounded (UUB) range. 4. **Verify the effectiveness of the algorithm**: The effectiveness of the proposed controller is verified through two simulation examples - Image - Based Visual Servoing (IBVS) and Wheeled Mobile Robot (WMR). ### System model and control objectives - **System dynamics**: Consider the system dynamics in the following form: \[ \dot{x}=g(x, \theta)u \] where \(x(t)\in\mathbb{R}^n\) is the state, \(u(t)\in\mathbb{R}^m\) is the control input, \(\theta\in\mathbb{R}^p\) is the vector of unknown parameters, and the input gain matrix \(g(x, \theta)\in\mathbb{R}^{n\times m}\) is expressed in parametric form \(\text{vec}(g(x, \theta)) = Y(x)\theta\). - **Control objective**: Regulate the current state \(x(t)\) to the desired state \(x_d\), and define the regulation error \(\bar{x}(t)=x(t)-x_d\) and the parameter estimation error \(\tilde{\theta}(t)=\theta - \hat{\theta}(t)\). ### Optimal control design - **Continuous - time reinforcement learning controller design**: By defining the optimal value function \(V^*(\bar{x})\) and the local cost function \(r(\bar{x}, u)\), the optimal control \(u^*\) is derived. - **Hamiltonian and Bellman error**: Through the Hamiltonian \(H(\bar{x}, u, V_{\bar{x}})\) and the Bellman error \(\delta\), the weight update laws for the actor and critic networks are designed. - **Approximate optimal control**: Using neural networks (NN) to approximate the optimal value function and the optimal control, the approximate value function and control law are designed. ### Parameter update law - **Concurrent Learning technique**: By collecting the historical data stack \(H\), the parameter update law of the Concurrent Learning technique is designed to estimate the unknown parameter \(\theta\). ### Stability analysis - **Lyapunov stability analysis**: By constructing a positive definite continuously differentiable Lyapunov function \(V\), it is proved that the signals of the closed - loop system converge within the uniformly ultimately bounded range.

Adaptive Actor-Critic Based Optimal Regulation for Drift-Free Uncertain Nonlinear Systems

A Learning-Based Optimal Tracking Controller for Continuous Linear Systems with Unknown Dynamics: Theory and Case Study

Realtime Brain-Inspired Adaptive Learning Control for Nonlinear Systems with Configuration Uncertainties (I)

Online Reinforcement Learning-based Neural Network Controller Design for Affine Nonlinear Discrete-time Systems.

Near Optimal Neural Network-based Output Feedback Control of Affine Nonlinear Discrete-Time Systems

Robust Adaptive Iterative Learning Control for Discrete‐time Nonlinear Systems with Both Parametric and Nonparametric Uncertainties

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

Robust Adaptive Iterative Learning Control for Discrete-Time Nonlinear Systems With Time-Iteration-Varying Parameters.

Adaptive Observation-Based Efficient Reinforcement Learning for Uncertain Systems

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Adaptive Neural Dynamic Surface Control With Prespecified Tracking Accuracy of Uncertain Stochastic Nonstrict-Feedback Systems

Performance-Guaranteed Adaptive Optimized Control of Intelligent Surface Vehicle Using Reinforcement Learning

Model-based reinforcement learning for infinite-horizon approximate optimal tracking

Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems.

Actor-Critic Reinforcement Learning for Control With Stability Guarantee

Event-triggered Receding Horizon Control Via Actor-Critic Design

Robust Safe Reinforcement Learning Control of Unknown Continuous-Time Nonlinear Systems with State Constraints and Disturbances

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Robust Actor-Critic Learning for Continuous-Time Nonlinear Systems with Unmodeled Dynamics

Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control