Plume Tracing Via Model-Free Reinforcement Learning Method

Hangkai Hu,Shiji Song,C. L. Phillip Chen
DOI: https://doi.org/10.1109/tnnls.2018.2885374
IF: 14.255
2019-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:This paper studies the plume-tracing strategy for an autonomous underwater vehicle (AUV) in the deep-sea turbulent environment. The tracing problem is modeled as a partially observable Markov decision process with continuous state space and action space due to the spatio-temporal changes of environment. An long short-term memory-based reinforcement learning framework with full use of history information is proposed to generate a smooth strategy while the AUV interacting with the environment. Continuous temporal difference and deterministic policy gradient methods are employed to improve the strategy. To promote the performance of the algorithm, a supervised strategy generated by dynamic programming methods is utilized as transcendental knowledge of the agent. Historical searching trajectory's form and the exploration technology are specially designed to fit the algorithm. Simulation environments are established based on Reynolds-averaged Navier-Stokes equations and the effectiveness of the learned plume-tracing strategy is validated with simulation experiments.
What problem does this paper attempt to address?