A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch

Federico Ceola,Elisa Maiettini,Lorenzo Rosasco,Lorenzo Natale
DOI: https://doi.org/10.1109/IROS55552.2023.10341776
2023-07-28
Abstract:Multi-fingered robotic hands have potential to enable robots to perform sophisticated manipulation tasks. However, teaching a robot to grasp objects with an anthropomorphic hand is an arduous problem due to the high dimensionality of state and action spaces. Deep Reinforcement Learning (DRL) offers techniques to design control policies for this kind of problems without explicit environment or hand modeling. However, state-of-the-art model-free algorithms have proven inefficient for learning such policies. The main problem is that the exploration of the environment is unfeasible for such high-dimensional problems, thus hampering the initial phases of policy optimization. One possibility to address this is to rely on off-line task demonstrations, but, oftentimes, this is too demanding in terms of time and computational resources. To address these problems, we propose the A Grasp Pose is All You Need (G-PAYN) method for the anthropomorphic hand of the iCub humanoid. We develop an approach to automatically collect task demonstrations to initialize the training of the policy. The proposed grasping pipeline starts from a grasp pose generated by an external algorithm, used to initiate the movement. Then a control policy (previously trained with the proposed G-PAYN) is used to reach and grab the object. We deployed the iCub into the MuJoCo simulator and use it to test our approach with objects from the YCB-Video dataset. Results show that G-PAYN outperforms current DRL techniques in the considered setting in terms of success rate and execution time with respect to the baselines. The code to reproduce the experiments is released together with the paper with an open source license.
Robotics
What problem does this paper attempt to address?
The problem this paper attempts to address is the challenge posed by the high-dimensional state and action space when multi-fingered robotic hands grasp objects. Specifically, existing Deep Reinforcement Learning (DRL) methods perform poorly in learning multi-fingered hand grasping tasks, mainly due to the difficulty of environment exploration and the challenge of policy optimization in the initial stages. Additionally, current methods often rely on offline task demonstrations, which are time and computationally intensive, and these methods often require joint and object state information that is difficult to obtain or prone to errors. To overcome these issues, the authors propose a method called "G-PAYN" (A Grasp Pose is All You Need), which utilizes automatically collected task demonstrations to initialize policy training. The specific steps are as follows: 1. **Generate Initial Grasp Pose**: Use an external algorithm to generate an initial grasp pose as prior information. 2. **Approach the Target Object**: Move the end-effector of the robotic hand to a position close to the initial grasp pose. 3. **Execute the Grasp**: Use a pre-trained DRL policy to complete the grasping action by predicting Cartesian offsets and finger joint offsets. The authors conducted experiments using the iCub humanoid robot in the MuJoCo simulator, testing the grasping performance on 5 different objects and comparing it with existing DRL baseline methods. The experimental results show that G-PAYN outperforms existing DRL methods in terms of success rate and execution time, and in some cases, significantly exceeds the performance of offline demonstration pipelines. In summary, this paper aims to improve the grasping ability of multi-fingered robotic hands in complex tasks by combining automatically collected demonstration data and initial grasp poses.