Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand

Davide Liconti,Yasunori Toshimitsu,Robert Katzschmann
2024-04-25
Abstract:In the context of imitation learning applied to dexterous robotic hands, the high complexity of the systems makes learning complex manipulation tasks challenging. However, the numerous datasets depicting human hands in various different tasks could provide us with better knowledge regarding human hand motion. We propose a method to leverage multiple large-scale task-agnostic datasets to obtain latent representations that effectively encode motion subtrajectories that we included in a transformer-based behavior cloning method. Our results demonstrate that employing latent representations yields enhanced performance compared to conventional behavior cloning methods, particularly regarding resilience to errors and noise in perception and proprioception. Furthermore, the proposed approach solely relies on human demonstrations, eliminating the need for teleoperation and, therefore, accelerating the data acquisition process. Accurate inverse kinematics for fingertip retargeting ensures precise transfer from human hand data to the robot, facilitating effective learning and deployment of manipulation policies. Finally, the trained policies have been successfully transferred to a real-world 23Dof robotic system.
Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered when applying imitation learning to dexterous robotic hands, especially how to effectively learn complex manipulation tasks with a small amount of data. Specifically, the paper proposes a method that obtains latent representations by leveraging multiple large - scale task - independent datasets. These representations can effectively encode motion sub - trajectories and incorporate them into a Transformer - based behavior cloning method. This method aims to improve performance, especially in terms of robustness to errors and noise in perception and proprioception. Moreover, this method relies entirely on human demonstrations, eliminating the need for teleoperation, thereby accelerating the data acquisition process. Repositioning the fingertips through precise inverse kinematics ensures the accurate transfer from human hand data to the robot, facilitating effective learning and deployment of manipulation strategies. Eventually, the trained strategy is successfully transferred to an actual 23 - degree - of - freedom robotic system.