APPA-3D: an autonomous 3D path planning algorithm for UAVs in unknown complex environments

Jintao Wang,Zuyi Zhao,Jiayi Qu,Xingguo Chen
DOI: https://doi.org/10.1038/s41598-024-51286-2
IF: 4.6
2024-01-13
Scientific Reports
Abstract:Due to their high flexibility, low cost, and ease of handling, Unmanned Aerial Vehicles (UAVs) are often used to perform difficult tasks in complex environments. Stable and reliable path planning capability is the fundamental demand for UAVs to accomplish their flight tasks. Most researches on UAV path planning are carried out under the premise of known environmental information, and it is difficult to safely reach the target position in the face of unknown environment. Thus, an autonomous collision-free path planning algorithm for UAVs in unknown complex environments (APPA-3D) is proposed. An anti-collision control strategy is designed using the UAV collision safety envelope, which relies on the UAV's environmental awareness capability to continuously interact with external environmental information. A dynamic reward function of reinforcement learning combined with the actual flight environment is designed and an optimized reinforcement learning action exploration strategy based on the action selection probability is proposed. Then, an improved RL algorithm is used to simulate the UAV flight process in unknown environment, and the algorithm is trained by interacting with the environment, which finally realizes autonomous collision-free path planning for UAVs. The comparative experimental results in the same environment show that APPA-3D can effectively guide the UAV to plan a safe and collision-free path from the starting point to the target point in an unknown complex 3D environment.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to solve the problem of autonomous path planning of unmanned aerial vehicles (UAVs) in unknown and complex environments. Specifically, most of the existing UAV path planning research is carried out on the premise of known environmental information, which makes it difficult for UAVs to reach the target location safely when facing unknown environments. Therefore, this paper proposes an algorithm (APPA - 3D) for collision - free autonomous path planning in unknown and complex environments. ### The main contributions of the paper include: 1. **Designed the anti - collision control strategy for UAVs**: - Utilized the environmental perception ability of UAVs and designed an anti - collision safety envelope, which triggers different anti - collision strategies based on the distance between the UAV and obstacles. - Drew on the near - mid - air collision rules (NMAC) of civil aircraft and the International Regulations for Preventing Collisions at Sea (COLREGS) and proposed four different anti - collision strategies to deal with different types of dynamic obstacles. 2. **Optimized the reward function generation mechanism of reinforcement learning (RL)**: - Combined with the artificial potential field method (APF), designed a dynamic reward function that can generate dynamic rewards in real - time according to the actual flight environment information of UAVs, solving the problem of difficult convergence of traditional RL algorithms in high - dimensional spaces. 3. **Proposed an RL exploration strategy based on action selection probability**: - Aimed at the "exploration - exploitation" dilemma faced by RL in the path planning process, proposed an RL exploration strategy based on action selection probability. This strategy dynamically adjusts the action selection strategy by combining the magnitude of the value function in different states, thereby improving the efficiency of path search. ### Specific technical details: - **Anti - collision safety envelope**: - Defined three regions: safety zone (SZ), collision avoidance zone (CZ) and mandatory collision avoidance zone (MZ). When an obstacle enters these regions, the UAV will take corresponding anti - collision measures. - Mathematical representation is as follows: - \( D_{\text{max}}\): the maximum detection distance of the sensor. - \( D_{\text{cz}}\): the threshold of the collision avoidance zone. - \( D_{\text{mz}}\): the threshold of the mandatory collision avoidance zone. - **Dynamic reward function**: - Utilized the APF method to define the gravitational potential field function and the repulsive potential field function, representing the attraction of the target point and the repulsion of obstacles respectively. - The mathematical expressions of the potential field functions are: \[ U_{\text{att}}(X)=\frac{1}{2}k_{\text{att}}\|X - X_g\|^2 \] \[ U_{\text{rep}}(X)=\begin{cases} \frac{1}{2}k_{\text{rep}}\left(\frac{1}{\|X - X_o\|}-\frac{1}{D_{\text{safe}}}\right)^2 & \text{if }\|X - X_o\|<D_{\text{safe}}\\ 0 & \text{otherwise} \end{cases} \] - The total potential field function is: \[ U(X)=U_{\text{att}}(X)+\sum_{i}U_{\text{rep}}(X, X_{o_i}) \] - **RL exploration strategy**: - Proposed an exploration strategy based on action selection probability, which improves the convergence speed and path search efficiency of the algorithm by dynamically adjusting the proportion of exploration and exploitation. - The mathematical expression is: \[ \pi(a|s)=\begin{cases} \epsilon/|A|+(1 - \epsilon)\cdot\frac{e^{Q(s,a)/T}}{\sum_{a'\in A}e^{Q(s,a')/T}} & \text{if }\tex