High-Level Tracking of Autonomous Underwater Vehicles Based on Pseudo Averaged Q-Learning.

Wenjie Shi,Shiji Song,Cheng Wu
DOI: https://doi.org/10.1109/smc.2018.00701
2018-01-01
Abstract:In this paper, we investigate the trajectory tracking problem of underactuated autonomous underwater vehicles (AU-Vs) with input saturation. Our proposed model-free algorithm can realize high-level tracking control and stable learning by employing a novel actors-critics architecture, where a critic and multiple actors are learned to estimate the action-value function and deterministic policy, respectively. For the critic, Pseudo Averaged Q-learning, which is a simple extension to Q-learning, is proposed to calculate the target value, specifically, the action-value of next state is obtained by maximizing the average over the last multiple previous learned action-value estimates among all actors. As for the actors, deterministic policy gradient is applied to update the weights. The effectiveness and performance of the proposed Pseudo Averaged Q-learning based deterministic policy gradient (PAQ-DPG) algorithm is verified by implementation to an underactuated AUV. And the results demonstrate high-level tracking control accuracy and stability of learning of PAQ-DPG algorithm. Besides, under our proposed actors-critics framework, increasing the number of actors will further improve the performance.
What problem does this paper attempt to address?