USV Trajectory Tracking Control Based on Receding Horizon Reinforcement Learning

Yinghan Wen,Yuepeng Chen,Xuan Guo
DOI: https://doi.org/10.3390/s24092771
IF: 3.9
2024-04-27
Sensors
Abstract:We present a novel approach for achieving high-precision trajectory tracking control in an unmanned surface vehicle (USV) through utilization of receding horizon reinforcement learning (RHRL). The control architecture for the USV involves a composite of feedforward and feedback components. The feedforward control component is derived directly from the curvature of the reference path and the dynamic model. Feedback control is acquired through application of the RHRL algorithm, effectively addressing the problem of achieving optimal tracking control. The methodology introduced in this paper synergizes with the rolling time domain optimization mechanism, converting the perpetual time domain optimal control predicament into a succession of finite time domain control problems amenable to resolution. In contrast to Lyapunov model predictive control (LMPC) and sliding mode control (SMC), our proposed method employs the RHRL controller, which yields an explicit state feedback control law. This characteristic endows the controller with the dual capabilities of direct offline and online learning deployment. Within each prediction time domain, we employ a time-independent executive–evaluator network structure to glean insights into the optimal value function and control strategy. Furthermore, we substantiate the convergence of the RHRL algorithm in each prediction time domain through rigorous theoretical proof, with concurrent analysis to verify the stability of the closed-loop system. To conclude, USV trajectory control tests are carried out within a simulated environment.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
This paper attempts to solve the problem of high - precision trajectory tracking control of unmanned surface vehicles (USV) in complex marine environments. Specifically, the author proposes a control method based on Receding Horizon Reinforcement Learning (RHRL), aiming to improve the lateral control precision of USV. The paper mentions that existing control methods such as PID control, fuzzy control, model predictive control (MPC) and sliding mode control (SMC), etc., have certain limitations in achieving optimal trajectory tracking control, especially when dealing with nonlinear systems and environmental disturbances. Therefore, by combining feed - forward and feedback control components and using the RHRL algorithm, this paper proposes a new control architecture to overcome the shortcomings of existing methods. ### Main problems 1. **High - precision trajectory tracking**: How to achieve high - precision trajectory tracking control of USV in complex marine environments? 2. **Optimizing control performance**: How to design a control method that can improve computational efficiency and learning efficiency while ensuring control precision? 3. **Anti - interference ability**: How to enhance the stability and robustness of USV when it is subject to environmental disturbances? ### Solutions 1. **Dynamic deviation model**: First, a dynamic deviation model of USV is constructed, including feed - forward control and feedback control parts. The feed - forward control is directly derived from the curvature and deviation model of the reference path, while the feedback control is achieved by applying the RHRL algorithm. 2. **RHRL algorithm**: The RHRL algorithm based on the receding horizon optimization mechanism is proposed, which transforms the optimal control problem in the infinite - time domain into a series of heuristic dynamic programming problems in the finite - time domain. This method can be not only online - learned but also directly deployed offline. 3. **Convergence and stability analysis**: Through strict theoretical proof, the convergence of the RHRL algorithm in each prediction time domain and the stability of the closed - loop system are analyzed. 4. **Simulation verification**: The USV trajectory control test is carried out in the simulation environment to verify the effectiveness of the proposed method. The experimental results show that compared with the traditional Lyapunov model predictive control (LMPC), the RHRL method has significant advantages in computational efficiency, sample complexity and learning efficiency. ### Key formulas - **Rotation matrix**: \[ R(\theta)=\begin{bmatrix} \cos\theta&-\sin\theta&0\\ \sin\theta&\cos\theta&0\\ 0&0&1 \end{bmatrix} \] - **Dynamics equation**: \[ M\dot{v}+C(v)v + D(v)v+g(\xi)=\kappa \] where \(\kappa = [F_u, F_v, F_r]^T\) represents the thrust of the thruster, \(M\) is the mass matrix, \(C(v)\) is the Coriolis and centrifugal matrix, \(D(v)\) is the damping matrix, and \(g(\xi)\) is the restoring force. - **State equation**: \[ \dot{e}=A_c e + B_{c1}u + B_{c2}\omega_d \] where \(e = [e_y,\dot{e}_y,e_\phi,\dot{e}_\phi]^T\) represents the lateral error state quantity, \(u=\delta_f\) is the control input, and \(\omega_d=\dot{\phi}_d\) is the desired heading angular velocity. - **Performance index function**: \[ V(e(k))=\sum_{l = k}^{k + N-1}L(e(l),u_b(l))+V_f(e(k + N)) \] where \(L(e(l),u_b(l))=e^T(l)Qe(l)+Pu_b^2(l)\), \(Q\) is a positive definite matrix, and \(P\) is a preset positive real number.