Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity

AmirMohammad Tahmasbi,MohammadSaleh Faghfoorian,Saeed Khodaygan,Aniket Bera
2024-11-01
Abstract:Path planning in high-dimensional spaces poses significant challenges, particularly in achieving both time efficiency and a fair success rate. To address these issues, we introduce a novel path-planning algorithm, Zonal RL-RRT, that leverages kd-tree partitioning to segment the map into zones while addressing zone connectivity, ensuring seamless transitions between zones. By breaking down the complex environment into multiple zones and using Q-learning as the high-level decision-maker, our algorithm achieves a 3x improvement in time efficiency compared to basic sampling methods such as RRT and RRT* in forest-like maps. Our approach outperforms heuristic-guided methods like BIT* and Informed RRT* by 1.5x in terms of runtime while maintaining robust and reliable success rates across 2D to 6D environments. Compared to learning-based methods like NeuralRRT* and MPNetSMP, as well as the heuristic RRT*J, our algorithm demonstrates, on average, 1.5x better performance in the same environments. We also evaluate the effectiveness of our approach through simulations of the UR10e arm manipulator in the MuJoCo environment. A key observation of our approach lies in its use of zone partitioning and Reinforcement Learning (RL) for adaptive high-level planning allowing the algorithm to accommodate flexible policies across diverse environments, making it a versatile tool for advanced path planning.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the challenges faced by path planning in high - dimensional spaces, especially how to achieve both time - efficiency and a relatively high success rate simultaneously. Specifically, the authors propose a new path - planning algorithm - Zonal RL - RRT. This algorithm divides the map into multiple regions through the kd - tree partitioning technique and solves the connectivity problem between regions, ensuring a smooth transition between regions. By decomposing the complex environment into multiple regions and using Q - learning as a high - level decision - maker, the time - efficiency of this algorithm in forest maps is three times higher than that of basic sampling methods such as RRT and RRT*. Moreover, compared with heuristic - guided methods such as BIT* and Informed RRT*, the algorithm improves the running time by 1.5 times while maintaining a robust success rate. Compared with learning - based methods such as NeuralRRT* and MPNetSMP and the heuristic method RRT*J, the performance of this algorithm in the same environment is improved by an average of 1.5 times on average. ### Main Contributions 1. **Reducing Complexity**: By partitioning the map in combination with the spatial distribution of obstacles, especially in highly dense and cluttered environments, the algorithm improves the average running time by 64% in dense forest maps in 2D environments compared to baseline methods (RRT, RRT*, BIT*, and Informed RRT*). In 3D environments, the average running time is improved by 38% and the success rate reaches a similar level. In the 6D robot environment using the UR10e robotic arm, the performance of this algorithm is faster than BIT*. 2. **High Adaptability**: Different from existing methods that rely on training with specific datasets, Zonal RL - RRT does not depend on specific environmental types or agents, thus enhancing its versatility and adaptability in various scenarios. 3. **Policy Flexibility**: By using regional partitioning and reinforcement learning for high - level planning, the algorithm can flexibly adapt to different path - planning strategies. It can either avoid obstacles in conservative scenarios or take a more direct route in greedy scenarios. This flexibility enables it to adapt to various planning goals and environmental challenges in future applications. ### Method Overview 1. **Reward Function**: - **Collision Probability \( R_\rho \)**: Represents the density of obstacles in the region, defined as \( R_\rho \propto -\frac{A_{\text{obstacles}}}{A_{\text{zone}}} \), and a higher value indicates a greater collision risk. - **Distance to the Goal \( R_d \)**: Reflects the distance from the center of the region to the goal, defined as \( R_d \propto -\| Z_{\text{center}} - A_{\text{goal}} \| \), which encourages the selection of shorter and more direct paths. - **Final Reward Function**: Considering the above parameters comprehensively, it is defined as \( R = \omega_1 \cdot R_d + \omega_2 \cdot R_\rho + \omega_3 \times I[\text{goal reached}] \), where \( \omega_1, \omega_2, \omega_3 \) are weighting factors, and \( I[\text{goal reached}] \) is an indicator function, which takes 1 when the goal is reached and 0 otherwise. 2. **kd - tree Algorithm**: Dynamically partitions the environmental space \( E \) according to the distribution and density of obstacles, mathematically represented as: \[ \text{Partition}(E, \text{depth}) = \begin{cases} \text{Split}(E, \text{median}(C_{\text{axis}}), \text{axis}) & \text{if } \text{depth} \leq \text{MaxDepth} \\ E & \text{otherwise} \end{cases} \] where \(\text{Split}(E, \text{medi}