H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Yanjie Ze,Yuyao Liu,Ruizhe Shi,Jiaxin Qin,Zhecheng Yuan,Jiashun Wang,Huazhe Xu
2023-10-13
Abstract:Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at <a class="link-external link-https" href="https://yanjieze.com/H-InDex" rel="external noopener nofollow">this https URL</a> .
Machine Learning,Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the efficiency issue of multi-fingered robotic hands in performing complex dexterous manipulation tasks. Specifically, the paper proposes a method called H-InDex (Hand-Informed Visual Reinforcement Learning Framework) that enhances the dexterous manipulation capabilities of robots by leveraging visual representations of human hands. The main objectives of the paper are: 1. **Improve Sample Efficiency**: Enable robots to learn complex dexterous manipulation tasks more efficiently with a limited number of interactions. 2. **Leverage Human Hand Priors**: Transfer the dexterity of human hands to robotic hands' operations through a pre-trained 3D hand pose estimation model. 3. **Adapt to Different Tasks**: Validate the effectiveness of the method across various dexterous manipulation tasks, including hammering, door handling, writing, pouring water, and placing objects. ### Method Overview The H-InDex framework consists of three stages: 1. **Representation Pre-training**: Pre-train visual representations using a 3D hand pose estimation task to enable the model to understand the dexterity of human hands. 2. **Representation Offline Adaptation**: Fine-tune only the affine transformation parameters of the BatchNorm layers (approximately 0.18% of the total parameters) in the pre-trained model through a self-supervised keypoint detection task to adapt to the morphological and structural differences of robotic hands. 3. **Reinforcement Learning**: During the reinforcement learning stage, freeze the visual representations and use Exponential Moving Average (EMA) to update the mean and variance of the BatchNorm layers to adapt to the changing observation distribution. ### Experimental Results The paper conducted experiments on 12 challenging dexterous manipulation tasks and compared the results with several strong baseline models (such as VC-1, MVP, R3M, RRL). The results show that H-InDex significantly outperforms these baseline models in most tasks, particularly excelling in sample efficiency. ### Main Contributions 1. **Proposed a New Visual Reinforcement Learning Framework**: H-InDex effectively enhances the dexterous manipulation capabilities of robots by utilizing rich hand information. 2. **Validated Effectiveness Across Multiple Challenging Tasks**: Demonstrated the superior performance of H-InDex in 12 dexterous manipulation tasks through experiments. 3. **Provided Valuable Insights**: Explored the direct application of pre-trained models in dexterous manipulation tasks, particularly the application of 3D hand pose estimation models. Through these contributions, the paper offers new ideas and methods for research in the field of robotic dexterous manipulation.