Abstract:While there have been significant strides in dexterous manipulation, most of it is limited to benchmark tasks like in-hand reorientation which are of limited utility in the real world. The main benefit of dexterous hands over two-fingered ones is their ability to pickup tools and other objects (including thin ones) and grasp them firmly to apply force. However, this task requires both a complex understanding of functional affordances as well as precise low-level control. While prior work obtains affordances from human data this approach doesn't scale to low-level control. Similarly, simulation training cannot give the robot an understanding of real-world semantics. In this paper, we aim to combine the best of both worlds to accomplish functional grasping for in-the-wild objects. We use a modular approach. First, affordances are obtained by matching corresponding regions of different objects and then a low-level policy trained in sim is run to grasp it. We propose a novel application of eigengrasps to reduce the search space of RL using a small amount of human data and find that it leads to more stable and physically realistic motion. We find that eigengrasp action space beats baselines in simulation and outperforms hardcoded grasping in real and matches or outperforms a trained human teleoperator. Results visualizations and videos at <a class="link-external link-https" href="https://dexfunc.github.io/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem this paper attempts to solve is how to achieve functional grasping of complex everyday objects, particularly using low-cost dexterous hands (such as the LEAP hand) to accomplish this task. Existing robotic learning research mostly relies on two-finger grippers or suction cups, which have limitations when grasping tools and other objects that require fine manipulation. Functional grasping not only requires the robot to recognize and locate objects but also to understand the functional areas of the objects and to perform stable grasping actions to complete subsequent tasks, such as hammering, drilling, etc. The main contribution of the paper is the proposal of a modular approach that combines the advantages of internet data and large-scale simulation training to achieve this goal. Specifically, the method is divided into three stages: 1. **Pre-grasp stage**: Predicting the functional grasp points of objects through a one-shot learning affordance model. This model uses DINOv2 feature matching to find corresponding regions between different objects, thereby inferring the correct grasping positions. 2. **Grasp stage**: Executing the grasping action using strategies trained in a simulated environment. To overcome the challenges brought by the high-dimensional action space, the paper introduces the concept of eigengrasps, reducing the action space from 16 dimensions to 9 dimensions, making the training more stable and physically reasonable. 3. **Post-grasp stage**: Once the object is stably grasped, a 6-DOF robotic arm can be used to move it to any position in space to complete specific tasks. Through this method, the paper demonstrates how to achieve functional grasping of various complex objects in the real world, including hammers, electric drills, frying pans, staplers, and screwdrivers, even if these objects did not appear during the training process. This marks significant progress in the functional grasping capabilities of dexterous hands.

Dexterous Functional Grasping

DexRepNet: Learning Dexterous Robotic Grasping Network with Geometric and Spatial Hand-Object Representations

Dext-Gen: Dexterous Grasping in Sparse Reward Environments with Full Orientation Control

DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics

DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations

FunGrasp: Functional Grasping for Diverse Dexterous Hands

Generalized Anthropomorphic Functional Grasping with Minimal Demonstrations

DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments

Learning Human-Like Functional Grasping for Multifinger Hands From Few Demonstrations

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

Learning Diverse and Physically Feasible Dexterous Grasps with Generative Model and Bilevel Optimization

Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

Multi-fingered Dynamic Grasping for Unknown Objects

Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation

D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

GenDexGrasp: Generalizable Dexterous Grasping

Deep Reinforcement Learning of Dexterous Pre-grasp Manipulation for Human-like Functional Categorical Grasping

DextrAH-RGB: Visuomotor Policies to Grasp Anything with Dexterous Hands