Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Irmak Guzey,Ben Evans,Soumith Chintala,Lerrel Pinto
DOI: https://doi.org/10.48550/arXiv.2303.12076
2023-03-22
Abstract:Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.
Robotics,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the challenges of dexterous manipulation with multi-fingered robotic hands. Specifically, the paper proposes a new method—T-D EX (Tactile-based Dexterity), which enhances the robot's dexterity through tactile sensors. Most existing research primarily focuses on visual observation or vision-based state estimation, which perform poorly in tasks requiring fine manipulation, especially when the fingers occlude the object being manipulated. T-D EX addresses this issue through the following two stages: 1. **Pre-training Stage**: - Collect 2.5 hours of robot play data, which is used to train a self-supervised tactile encoder. The goal of this stage is to transform high-dimensional tactile readings into low-dimensional embedded representations. 2. **Downstream Learning Stage**: - Given a small amount of task demonstration data (6 demonstrations per task, equivalent to less than 10 minutes of demonstration time), learn a non-parametric policy that combines tactile and visual observations to complete the task. ### Main Contributions - **Importance of Tactile Data**: The paper emphasizes the importance of tactile data in dexterous manipulation, particularly in tasks requiring contact force reasoning or when fingers occlude the object. - **Self-supervised Learning**: By collecting a large amount of goal-free play data, the paper uses self-supervised learning techniques to train the tactile encoder, thereby reducing the need for precise force calibration. - **Non-parametric Policy**: Utilizing a nearest neighbor retrieval method, the paper efficiently learns dexterous manipulation strategies from a small amount of demonstration data by combining tactile and visual information. - **Experimental Validation**: Extensive experiments were conducted on five challenging dexterous tasks, showing that T-D EX improves the average success rate by 1.7 times compared to pure vision and torque baseline models. ### Conclusion By combining tactile and visual information, T-D EX excels in various complex dexterous tasks, particularly those requiring fine manipulation and contact force reasoning. This method not only improves task success rates but also reduces the reliance on a large amount of demonstration data, thereby enhancing learning efficiency.