Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

Osama Ahmad,Zawar Hussain,Hammad Naeem
2024-03-25
Abstract:This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.
Robotics,Systems and Control
What problem does this paper attempt to address?
The main objective of this paper is to achieve robot trajectory planning in dynamic environments using deep reinforcement learning (specifically the Deep Deterministic Policy Gradient algorithm, or DDPG). Specifically, the researchers used a 7-degree-of-freedom robotic arm to perform grasping and placing tasks in an unknown and dynamically changing environment. The main challenges are: 1. **Environmental Uncertainty**: Obstacles move randomly, causing the robotic arm to operate in an uncertain environment. 2. **Collision Avoidance**: The robotic arm must avoid obstacles and complete the grasping and placing tasks within a limited time. To verify the effectiveness of their method, the authors set up two experimental scenarios: - **S1**: The target object and position change randomly, but there are no obstacles. - **S2**: The target object and position change randomly, and there is a randomly moving obstacle. By comparing the model performance under Sparse Reward and Dense Reward, the study found that Sparse Reward can better facilitate the learning process in some cases, especially in environments with moving obstacles. Additionally, the paper discusses possible future research directions, such as combining Graph Neural Networks (GNN) and Model Predictive Control (MPC) to further optimize the trajectory planning algorithm.