Improved M-DQN with $\Epsilon$-Ucb Action Selection Policy and Multi-Goal Fusion Reward Function for Mobile Robot Path Planning

Sheng Chunyang,An Hao,Nie Jun,Wang Haixia,Lu Xiao
DOI: https://doi.org/10.1109/tvt.2024.3503552
IF: 6.8
2024-01-01
IEEE Transactions on Vehicular Technology
Abstract:Both DQN and M-DQN is commonly used for mobile robot path planning, with the problem of relatively low efficiencies of the action selection policy and the reward function when coping with complex environments. To solve the problem, an improved $\epsilon$ -UCB action selection method based on neural network is proposed to fit the uncertainty of state-action pairs, which effectively improves the exploration efficiency. Meanwhile, aiming at the effectiveness of the reward function, a multi-goal fusion reward function is proposed to improve the guidance ability. Further, the proposed $\epsilon$ -UCB action selection policy and the multi-goal fusion reward function are introduced into the M-DQN framework to complete the path planning task of the mobile robot. Finally, some comparative experiments are carried out to verify the effectiveness of the proposed method. The experimental results show that the proposed method exhibits faster convergence speed and higher learning efficiency for the path planning of mobile robots in complex environments than other existing methods.
What problem does this paper attempt to address?