A Plume-Tracing Strategy Via Continuous State-Action Reinforcement Learning

Lvyin Niu,Shiji Song,Keyou You
DOI: https://doi.org/10.1109/cac.2017.8242868
2017-01-01
Abstract:This paper proposes plume-tracing strategy for an autonomous underwater vehicle (AUV) in the deep-sea environment. In order to dynamically adapt the complex environment and optimize the policy during interaction, reinforcement learning (RL) with continuous state and action domain is applied in this problem. Different from traditional strategies which have predesigned and stationary actions, this learning-based approach can smooth the searching trajectory and reduce the risk of losing plume. To achieve this, this paper models the tracing problem as a Markov decision process (MDP) with unknown transition matrix. Continuous temporal difference and deterministic policy gradient method are used to estimate and improve the policy. Moreover, supervised initialization, reward shaping and modified exploration technology are proposed to accelerate learning. The effectiveness and efficiency of the proposed strategy is validated in the simulation experiment.
What problem does this paper attempt to address?