A Hybrid Intelligent Path Planning Method Based on Improved Dyna-H Architecture for Unmanned Surface Vessel

Lizheng Wang,Chunhui Zhou,Man Zhu,Yuanqiao Wen,Changshi Xiao,Wuqiang Sun
DOI: https://doi.org/10.1109/cac53003.2021.9728637
2021-01-01
Abstract:Path planning is one of the key technologies to support the autonomous and safe navigation of unmanned surface vessel (USV). In order to solve the problems of slow convergence speed, many iterations, and unstable convergence results when the traditional reinforcement learning algorithm is applied to the path planning of the USV. In this paper, an improved hybrid path planning method based on Dyna architecture(IDH) is proposed. The Dyna architecture constructed based on model learning and direct reinforcement learning can give full play to the advantages of both and has good robustness in dealing with uncertain existing problems. The path searching method combining Q-learning and heuristic algorithm can learn quickly according to different environmental conditions and search for safe paths that meet the planning objectives (such as shortest path and lowest energy consumption, etc.). In this paper, the kinematic characteristics of target distance, distance to closest point of approach(DCPA), time to closest point of approach(TCPA), and steering angle are comprehensively considered when constructing the reward and punishment strategy and action selection rules of Q-Learning. Then the post smoother was developed to improve the performance of the generated route, reducing unnecessary ’jags’. Taking the actual environment maps as examples, the proposed planning method is simulated and verified. The Q-Learning, Dyna-Q and Rapid-exploration Random Tree(RRT) algorithm are chosen as the comparison method. Experiments show that the proposed improved path planning method has the best adaptability and robustness under the influence of actual environment maps. The performance of the proposed approach in term of total distance is superior over the other algorithm. In addition, the search speed is faster than the other two reinforcement learning algorithms.
What problem does this paper attempt to address?