A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch

Federico Ceola,Elisa Maiettini,Lorenzo Rosasco,Lorenzo Natale

DOI: https://doi.org/10.1109/IROS55552.2023.10341776

2023-07-28

Abstract:Multi-fingered robotic hands have potential to enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, state-of-the-art model-free algorithms have proven inefficient for learning such policies. The main problem is that the exploration of the environment is unfeasible for such high-dimensional problems, thus hampering the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations, but, oftentimes, this is too demanding in terms of time and computational resources. To address these problems, we propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. Results show that G-PAYN outperforms current DRL techniques in the considered setting in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments is released together with the paper with an open source license.

Robotics

What problem does this paper attempt to address?

The problem this paper attempts to address is the challenge posed by the high-dimensional state and action space when multi-fingered robotic hands grasp objects. Specifically, existing Deep Reinforcement Learning (DRL) methods perform poorly in learning multi-fingered hand grasping tasks, mainly due to the difficulty of environment exploration and the challenge of policy optimization in the initial stages. Additionally, current methods often rely on offline task demonstrations, which are time and computationally intensive, and these methods often require joint and object state information that is difficult to obtain or prone to errors. To overcome these issues, the authors propose a method called "G-PAYN" (A Grasp Pose is All You Need), which utilizes automatically collected task demonstrations to initialize policy training. The specific steps are as follows: 1. **Generate Initial Grasp Pose**: Use an external algorithm to generate an initial grasp pose as prior information. 2. **Approach the Target Object**: Move the end-effector of the robotic hand to a position close to the initial grasp pose. 3. **Execute the Grasp**: Use a pre-trained DRL policy to complete the grasping action by predicting Cartesian offsets and finger joint offsets. The authors conducted experiments using the iCub humanoid robot in the MuJoCo simulator, testing the grasping performance on 5 different objects and comparing it with existing DRL baseline methods. The experimental results show that G-PAYN outperforms existing DRL methods in terms of success rate and execution time, and in some cases, significantly exceeds the performance of offline demonstration pipelines. In summary, this paper aims to improve the grasping ability of multi-fingered robotic hands in complex tasks by combining automatically collected demonstration data and initial grasp poses.

A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

RESPRECT: Speeding-up Multi-fingered Grasping with Residual Reinforcement Learning

Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

GraspGF: Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

Object Manipulation with an Anthropomorphic Robotic Hand via Deep Reinforcement Learning with a Synergy Space of Natural Hand Poses

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

EfficientGrasp: A Unified Data-Efficient Learning to Grasp Method for Multi-Fingered Robot Hands

In-Hand Re-grasp Manipulation with Passive Dynamic Actions via Imitation Learning

Dext-Gen: Dexterous Grasping in Sparse Reward Environments with Full Orientation Control

Multifingered Grasping Based on Multimodal Reinforcement Learning

A Deep Learning Approach to Grasping the Invisible

Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network

Grasping learning, optimization, and knowledge transfer in the robotics field

Learn to grasp unknown objects in robotic manipulation

GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping