End-to-end UAV obstacle avoidance decision based on deep reinforcement learning

DOI: https://doi.org/10.1051/jnwpu/20224051055
2022-11-28
Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University
Abstract:Aiming at the problem that the traditional UAV obstacle avoidance algorithm needs to build offline three-dimensional maps, discontinuous speed control and limited speed direction selection, we study the end-to-end obstacle avoidance decision method of UAV continuous action output based on DDPG(deep deterministic policy gradient) deep reinforcement learning algorithm. Firstly, an end-to-end decision control model based on DDPG algorithm is established. The model can output continuous control variables, namely UAV obstacle avoidance actions, according to the continuous state information perceived. Secondly, the training verification is carried out on the platform of UE4 + Airsim. The results show that the model can realize the end-to-end UAV obstacle avoidance decision. Finally, the 3DVFH(three dimensional vector field histogram) obstacle avoidance algorithm model with the same data source is compared and analyzed. The experiment shows that DDPG algorithm has better optimization effect on the obstacle avoidance trajectory of UAV. 针对传统无人机避障算法需要构建离线三维地图以及速度控制不连续、速度方向选择受限的问题, 基于深度确定性策略梯度(deep deterministic policy gradient, DDPG)的深度强化学习算法, 对无人机连续型动作输出的端到端避障决策方法展开研究。建立了基于DDPG算法的端到端决策控制模型, 该模型可以根据感知得到的连续状态信息输出连续控制变量即无人机避障动作; 在UE4+Airsim的平台下进行了训练验证表明该模型可以实现端到端的无人机避障决策, 与数据来源相同的三维向量场直方图(three dimensional vector field histogram, 3DVFH)避障算法模型进行了对比分析, 实验表明DDPG算法对无人机的避障轨迹有更好的优化效果。
What problem does this paper attempt to address?