Ashwin P. Dani,Shubhendu Bhasin
Abstract:In this paper, a continuous-time adaptive actor-critic reinforcement learning (RL) controller is developed for drift-free nonlinear systems. Practical examples of such systems are image-based visual servoing (IBVS) and wheeled mobile robots (WMR), where the system dynamics includes a parametric uncertainty in the control effectiveness matrix with no drift term. The uncertainty in the input term poses a challenge for developing a continuous-time RL controller using existing methods. In this paper, an actor-critic or synchronous policy iteration (PI)-based RL controller is presented with a concurrent learning (CL)-based parameter update law for estimating the unknown parameters of the control effectiveness matrix. An infinite-horizon value function minimization objective is achieved by regulating the current states to the desired with near-optimal control efforts. The proposed controller guarantees closed-loop stability and simulation results validate the proposed theory using IBVS and WMR examples.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to design a continuous - time Adaptive Actor - Critic (AAC) reinforcement learning controller in uncertain nonlinear systems. Specifically, such systems have the drift - free property and there are parametric uncertainties in the control effectiveness matrix. Existing continuous - time reinforcement learning methods are difficult to be directly applied to the control strategy design of such systems because these methods usually assume that the system dynamics are known or only partially unknown. The method proposed in this paper overcomes this challenge by combining the Concurrent Learning (CL) technique to estimate the unknown parameters in the control effectiveness matrix. This method can not only achieve approximately optimal control effort, but also incorporates the stability guarantee of the closed - loop system into the design.
### Main contributions
1. **Propose a new Adaptive Actor - Critic algorithm**: For a class of drift - free nonlinear systems with uncertainties in the control effectiveness matrix, a new continuous - time Adaptive Actor - Critic reinforcement learning algorithm is designed.
2. **Use the Concurrent Learning technique**: Through the Concurrent Learning technique, an adaptive parameter update law is proposed to estimate the unknown parameters in the control effectiveness matrix.
3. **Guarantee the stability of the system**: Using Lyapunov stability analysis, it is proved that the signals of the closed - loop system converge within the Uniformly Ultimately Bounded (UUB) range.
4. **Verify the effectiveness of the algorithm**: The effectiveness of the proposed controller is verified through two simulation examples - Image - Based Visual Servoing (IBVS) and Wheeled Mobile Robot (WMR).
### System model and control objectives
- **System dynamics**: Consider the system dynamics in the following form:
\[
\dot{x}=g(x, \theta)u
\]
where \(x(t)\in\mathbb{R}^n\) is the state, \(u(t)\in\mathbb{R}^m\) is the control input, \(\theta\in\mathbb{R}^p\) is the vector of unknown parameters, and the input gain matrix \(g(x, \theta)\in\mathbb{R}^{n\times m}\) is expressed in parametric form \(\text{vec}(g(x, \theta)) = Y(x)\theta\).
- **Control objective**: Regulate the current state \(x(t)\) to the desired state \(x_d\), and define the regulation error \(\bar{x}(t)=x(t)-x_d\) and the parameter estimation error \(\tilde{\theta}(t)=\theta - \hat{\theta}(t)\).
### Optimal control design
- **Continuous - time reinforcement learning controller design**: By defining the optimal value function \(V^*(\bar{x})\) and the local cost function \(r(\bar{x}, u)\), the optimal control \(u^*\) is derived.
- **Hamiltonian and Bellman error**: Through the Hamiltonian \(H(\bar{x}, u, V_{\bar{x}})\) and the Bellman error \(\delta\), the weight update laws for the actor and critic networks are designed.
- **Approximate optimal control**: Using neural networks (NN) to approximate the optimal value function and the optimal control, the approximate value function and control law are designed.
### Parameter update law
- **Concurrent Learning technique**: By collecting the historical data stack \(H\), the parameter update law of the Concurrent Learning technique is designed to estimate the unknown parameter \(\theta\).
### Stability analysis
- **Lyapunov stability analysis**: By constructing a positive definite continuously differentiable Lyapunov function \(V\), it is proved that the signals of the closed - loop system converge within the uniformly ultimately bounded range.