Abstract:Recent hand-object interaction datasets show limited real object variability and rely on fitting the MANO parametric model to obtain groundtruth hand shapes. To go beyond these limitations and spur further research, we introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. Following recent work, we consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence. This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of applications where the required accuracy and level of detail is important eg., object hand-over in human-robot collaboration, object scanning, or manipulation and contact point analysis. Importantly, the rigidity of the hand-object systems allows to tackle video-based 3D reconstruction of unknown hand-held objects using a 2-stage pipeline consisting of a rigid registration step followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set of non-trivial baselines for these two stages and show that it is possible to achieve promising object-agnostic 3D hand-object reconstructions employing an SfM toolbox or a hand pose estimator to recover the rigid transforms and off-the-shelf MVR algorithms. However, these methods remain sensitive to the initial camera pose estimates which might be imprecise due to lack of textures on the objects or heavy occlusions of the hands, leaving room for improvements in the reconstruction. Code and dataset are available at <a class="link-external link-https" href="https://europe.naverlabs.com/research/showme" rel="external noopener nofollow">this https URL</a>

HandO: a Hybrid 3D Hand–object Reconstruction Model for Unknown Objects

In-Hand 3D Object Reconstruction from a Monocular RGB Video

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Personalized Hand Modeling from Multiple Postures with Multi‐View Color Images

Reconstructing Hand-Held Objects from Monocular Video.

HandGCAT: Occlusion-Robust 3D Hand Mesh Reconstruction from Monocular Images.

Reconstructing Hand-Held Objects in 3D from Images and Videos

HandOS: 3D Hand Reconstruction in One Stage

HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

RealisticHands: A Hybrid Model for 3D Hand Reconstruction

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

3D Hand Reconstruction via Aggregating Intra and Inter Graphs Guided by Prior Knowledge for Hand-Object Interaction Scenario

SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation

DDF-HO: Hand-Held Object Reconstruction via Conditional Directed Distance Field

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Learning Explicit Contact for Implicit Reconstruction of Hand-held Objects from Monocular Images

HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

Model-based 3D Hand Reconstruction via Self-Supervised Learning

SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction

Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image