Abstract:Path planning is one of the research hotspots for outdoor mobile robots. This paper addresses the issues of slow convergence and low accuracy in the Double Deep Q Network (DDQN) method in environments with many obstacles in the context of deep reinforcement learning. A new algorithm, Improve Double Deep Q Network (IDDQN), is proposed, which utilizes second-order temporal difference methods and a binary tree data structure to improve the DDQN method. The improved method evaluates the actions of the current robot using second-order temporal difference methods and employs a binary tree structure to store the results obtained from these methods, replacing the traditional experience pool structure. The environment is constructed using a grid method, programmed in the Python language, with two two-dimensional grid maps created for simple and complex environments. DDQN and four related deep reinforcement learning methods, such as Multi-step updates and Experience Classification Double Deep Q Network (ECMS-DDQN), are compared through simulation experiments with the IDDQN method. Simulation results indicate that the IDDQN method improves various path planning metrics compared to the DDQN method and other relevant reinforcement learning methods. In the simple environment, IDDQN method exhibits a 26.89% improvement in step convergence time, a 22.58% improvement in reward convergence time, and a 10.30% improvement in average reward value after convergence compared to the original DDQN algorithm. It also outperforms other simulated methods in the simple environment, although the difference is not significant. In the complex environment, the IDDQN method avoids falling into local optima compared to other methods, demonstrating the accuracy of its strategy in complex environments. Other methods show artificially high average reward values after converging in local optima, lacking reference value. In the complex environment, IDDQN method exhibits a 33.22% improvement in step convergence time and a 25.47% improvement in reward convergence time compared to the original DDQN algorithm, clearly surpassing other participating simulated methods. The data above indicate that the IDDQN method improves both convergence speed and accuracy compared to the DDQN method and the relevant improvement methods simulated in this paper. Particularly in environments with many obstacles, the performance improvement is evident, allowing for effective path planning in such environments.

D3-TD3: Deep Dense Dueling Architectures in TD3 Algorithm for Robot Path Planning Based on 3D Point Cloud.

TD3 Based Collision Free Motion Planning for Robot Navigation

Path planning of mobile robot based on improved TD3 algorithm in dynamic environment

Multi-objective Path Planning Based on Deep Reinforcement Learning

Deep Reinforcement Learning with Multi-Critic TD3 for Decentralized Multi-Robot Path Planning

Multi-Agent Path Planning Method Based on Improved Deep Q-Network in Dynamic Environments

Path Planning of Unmanned Aerial Vehicle in Complex Environments Based on State-Detection Twin Delayed Deep Deterministic Policy Gradient

Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning

DM-DQN: Dueling Munchausen deep Q network for robot path planning

Dual-layer Multi-Robot Path Planning in Narrow-Lane Environments under Specific Traffic Policies

Path Planning in Complex Environments Using Attention-Based Deep Deterministic Policy Gradient

Path planning for outdoor mobile robots based on IDDQN (October 2023)

Path Planning for Autonomous Vehicles in Unknown Dynamic Environment Based on Deep Reinforcement Learning

Deep Reinforcement Learning for Indoor Mobile Robot Path Planning

Twin-Delayed Ddpg: A Deep Reinforcement Learning Technique To Model A Continuous Movement Of An Intelligent Robot Agent

Adaptive Deep Ant Colony Optimization–Asymmetric Strategy Network Twin Delayed Deep Deterministic Policy Gradient Algorithm: Path Planning for Mobile Robots in Dynamic Environments

Mapless Path Planning for Mobile Robot Based on Improved Deep Deterministic Policy Gradient Algorithm

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

Path Planning Method for Manipulators Based on Improved Twin Delayed Deep Deterministic Policy Gradient and RRT*

UAV Path Planning Based on the Average TD3 Algorithm With Prioritized Experience Replay

Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment