Abstract:Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the challenges of dexterous manipulation with multi-fingered robotic hands. Specifically, the paper proposes a new method—T-D EX (Tactile-based Dexterity), which enhances the robot's dexterity through tactile sensors. Most existing research primarily focuses on visual observation or vision-based state estimation, which perform poorly in tasks requiring fine manipulation, especially when the fingers occlude the object being manipulated. T-D EX addresses this issue through the following two stages: 1. **Pre-training Stage**: - Collect 2.5 hours of robot play data, which is used to train a self-supervised tactile encoder. The goal of this stage is to transform high-dimensional tactile readings into low-dimensional embedded representations. 2. **Downstream Learning Stage**: - Given a small amount of task demonstration data (6 demonstrations per task, equivalent to less than 10 minutes of demonstration time), learn a non-parametric policy that combines tactile and visual observations to complete the task. ### Main Contributions - **Importance of Tactile Data**: The paper emphasizes the importance of tactile data in dexterous manipulation, particularly in tasks requiring contact force reasoning or when fingers occlude the object. - **Self-supervised Learning**: By collecting a large amount of goal-free play data, the paper uses self-supervised learning techniques to train the tactile encoder, thereby reducing the need for precise force calibration. - **Non-parametric Policy**: Utilizing a nearest neighbor retrieval method, the paper efficiently learns dexterous manipulation strategies from a small amount of demonstration data by combining tactile and visual information. - **Experimental Validation**: Extensive experiments were conducted on five challenging dexterous tasks, showing that T-D EX improves the average success rate by 1.7 times compared to pure vision and torque baseline models. ### Conclusion By combining tactile and visual information, T-D EX excels in various complex dexterous tasks, particularly those requiring fine manipulation and contact force reasoning. This method not only improves task success rates but also reduces the reliance on a large amount of demonstration data, thereby enhancing learning efficiency.

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Masked Visual-Tactile Pre-training for Robot Manipulation

DexRepNet: Learning Dexterous Robotic Grasping Network with Geometric and Spatial Hand-Object Representations

DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity

See to Touch: Learning Tactile Dexterity through Visual Incentives

Rotating without Seeing: Towards In-hand Dexterity through Touch

Dexterous In-Hand Manipulation of Slender Cylindrical Objects through Deep Reinforcement Learning with Tactile Sensing

Using Tactile Sensing to Improve the Sample Efficiency and Performance of Deep Deterministic Policy Gradients for Simulated In-Hand Manipulation Tasks

Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

Bi-Touch: Bimanual Tactile Manipulation With Sim-to-Real Deep Reinforcement Learning

DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

Holo-Dex: Teaching Dexterity with Immersive Mixed Reality

Learning Deep Visuomotor Policies for Dexterous Hand Manipulation

Enhancing Dexterity in Robotic Manipulation via Hierarchical Contact Exploration

Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations

DEFT: Dexterous Fine-Tuning for Real-World Hand Policies

Tactile Dexterity: Manipulation Primitives with Tactile Feedback