Robotic Arm Manipulation with Inverse Reinforcement Learning & TD-MPC

Md Shoyib Hassan,Sabir Md Sanaullah
2024-08-07
Abstract:One unresolved issue is how to scale model-based inverse reinforcement learning (IRL) to actual robotic manipulation tasks with unpredictable dynamics. The ability to learn from both visual and proprioceptive examples, creating algorithms that scale to high-dimensional state-spaces, and mastering strong dynamics models are the main obstacles. In this work, we provide a gradient-based inverse reinforcement learning framework that learns cost functions purely from visual human demonstrations. The shown behavior and the trajectory is then optimized using TD visual model predictive control(MPC) and the learned cost functions. We test our system using fundamental object manipulation tasks on hardware.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to extend model - based Inverse Reinforcement Learning (IRL) to actual robotic manipulation tasks, especially in dynamically unpredictable situations. The main challenges include: 1. **Learning from visual and proprioceptive examples**: How to extract useful information from human demonstrations and transform it into a form that robots can understand, especially for tasks in high - dimensional state - space. 2. **Powerful dynamic models**: How to construct a dynamic model that can accurately predict the results of robotic actions, especially in low - dimensional feature - representation spaces. 3. **Complexity of the optimization problem**: In model - based Inverse Reinforcement Learning, there are two nested optimization problems - the inner optimization problem is to optimize the policy given the cost function and the transition model, and the outer optimization problem is to match the policy with the observed demonstrations by maximizing the cost function. This step is very difficult because it is necessary to measure the influence of changes in cost function parameters on the final policy parameters. To solve these problems, the authors propose a gradient - based Inverse Reinforcement Learning framework that can learn the cost function from visual human demonstrations and use TD - Visual Model Predictive Control (TD - MPC) to optimize the trajectory. Specifically, this method includes the following key steps: - **Key - point detector**: Train a key - point detector to extract low - dimensional visual features (key - points) from RGB image inputs. These key - points can represent important positions and areas in the image. - **Dynamic model**: Use a pre - trained dynamic model to predict the next key - point and joint state of the robot after performing a specific action. - **Gradient optimization**: Calculate the gradient of the cost function parameters with respect to the inner optimization process through gradient optimization techniques, thereby achieving more stable and efficient optimization. In the experimental part, the authors tested this method on the Franka Panda robotic arm and verified its effectiveness and robustness in basic object manipulation tasks.