Multiple Suboptimal Policies Integrated Reinforcement Learning Algorithm for Path Planning

Xinyuan Hu,Aiguo Chen,Sijie Zhang,Zhen Yang
DOI: https://doi.org/10.1109/iccece54139.2022.9712751
2022-01-14
Abstract:Reinforcement learning performs well in path planning tasks with unknown environments, where the agent relies on its own exploration capabilities to gather environment information and find the optimal path. However, in sparse reward tasks, the difficulty rises exponentially in obtaining external rewards and collecting available state-action pairs. The agent's policy may not converge by random exploration. In this paper, we propose a multiple suboptimal policies integrated reinforcement learning algorithm for path planning, which uses a reward shaping algorithm to integrate the best parts of different suboptimal policies into intrinsic reward and to assist the agent in finding the near-optimal path. On a sparse reward path planning task in the discrete action space, our method outperforms both the imitation learning method and deep Q-learning from demonstration in terms of convergence speed and policy stability.
What problem does this paper attempt to address?