Abstract:Collision-free motion is essential for mobile robots. Most approaches to collision-free and efficient navigation with wheeled robots require parameter tuning by experts to obtain good navigation behavior. This study investigates the application of deep reinforcement learning to train a mobile robot for autonomous navigation in a complex environment. The robot utilizes LiDAR sensor data and a deep neural network to generate control signals guiding it toward a specified target while avoiding obstacles. We employ two reinforcement learning algorithms in the Gazebo simulation environment: Deep Deterministic Policy Gradient and proximal policy optimization. The study introduces an enhanced neural network structure in the Proximal Policy Optimization algorithm to boost performance, accompanied by a well-designed reward function to improve algorithm efficacy. Experimental results conducted in both obstacle and obstacle-free environments underscore the effectiveness of the proposed approach. This research significantly contributes to the advancement of autonomous robotics in complex environments through the application of deep reinforcement learning.
What problem does this paper attempt to address?
The paper primarily focuses on addressing the problem of achieving safe and efficient autonomous navigation for mobile robots in complex environments. Specifically, the research team adopted a deep reinforcement learning approach, particularly enhancing the Proximal Policy Optimization (PPO) algorithm, and applied it to train mobile robots for collision-free autonomous navigation.
The key contributions of the paper can be summarized as follows:
1. **Comprehensive Observation Setup**: The study introduced a comprehensive observation setup, including laser radar readings (30 dimensions), the robot's past linear and angular velocities, the position relative to the target (expressed in polar coordinates), yaw angle, and the required orientation towards the target. This multi-dimensional observation framework helps the robot better understand its surroundings, thereby making more optimal decisions.
2. **Enhanced Neural Network Structure**: To improve the performance of the PPO algorithm, the researchers designed a customized neural network architecture specifically suited for the PPO algorithm. This improvement significantly enhanced overall performance, enabling the PPO algorithm to learn navigation strategies more effectively.
3. **Reward Function Design**: The study proposed a reward function aimed at encouraging the robot to approach the target, avoid collisions, and reward based on the degree of distance reduction to the target. Additionally, a new reward function was introduced to penalize proximity to obstacles and exponentially increase rewards as the robot approaches the target.
Through these contributions, the research team demonstrated the capability of the PPO algorithm and its enhanced version to navigate effectively in simulated environments, particularly without pre-built maps. Experimental results showed that under different environmental complexities and reward function designs, the enhanced PPO algorithm exhibited faster learning speeds and higher success rates compared to other methods (such as DDPG), especially in simpler environments. However, in more complex environments, while the performance of the enhanced PPO improved, DDPG still slightly outperformed in certain metrics.
In summary, this research provides a new solution for safe autonomous navigation of mobile robots in complex environments by leveraging deep reinforcement learning techniques, particularly through improvements to the PPO algorithm.