Abstract:Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at

What problem does this paper attempt to address?

The paper attempts to address the problem of accurately reconstructing hand-object interaction movements. Specifically, the authors propose a deep reinforcement learning method called HOIC (Hand-Object Interaction Controller), which aims to reconstruct the interaction actions between the hand and the object in real-time using a single RGBD camera, ensuring that these actions are physically reasonable and accurate. ### The main contributions of the paper include: 1. **Introduction of an object compensation control mechanism**: To overcome the differences between contact representation in the physical simulator and the hand-object contact mechanism in the real world, the HOIC framework generates not only hand control signals but also additional forces and torques that act directly on the object, significantly improving the system's stability. These compensatory forces and torques can be interpreted as part of the surface contact model, thus avoiding the complex modeling of soft tissue surface contact. 2. **Improved accuracy and physical correctness of interaction action reconstruction**: By applying compensatory forces and torques, HOIC seamlessly upgrades the simple point contact model to a surface contact model that better conforms to physical principles, further enhancing the accuracy and physical reasonableness of the reconstructed actions. 3. **No need for heuristic physical rules**: Experiments show that even without involving any heuristic physical rules, HOIC can successfully introduce physical principles into the complex action reconstruction of hand-object interactions, which are usually difficult to mimic using deep reinforcement learning. ### Method Overview: - **State Definition**: The state includes the hand pose of the current frame, hand velocity, object pose, object velocity, and the kinematic hand pose and object pose of the next few frames. - **Action Definition**: The action includes hand joint torques, compensatory forces, and compensatory torques, which are input into the physical simulator to generate physically reasonable actions. - **Reward Definition**: The reward consists of imitation rewards and physical rewards. Imitation rewards encourage the actions generated by the policy to be similar to the reference actions; physical rewards ensure that the use of compensatory forces and torques complies with physical laws. ### Experimental Results: - **Comparison with existing methods**: Experimental results show that HOIC outperforms purely vision-based methods and other methods that introduce heuristic physical rules in terms of training speed and imitation quality. - **System design evaluation**: The authors also evaluated key design aspects of the system, such as the impact of object compensation control on the training process and the effect of different numbers of future frames on policy performance. Overall, HOIC effectively addresses the real-time, high-precision reconstruction of hand-object interaction actions by introducing an object compensation control mechanism, providing new solutions for fields such as virtual reality, human-computer interaction, and robotic learning.

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

In-Hand 3D Object Reconstruction from a Monocular RGB Video

Physical Interaction: Reconstructing Hand-object Interactions with Physics

Kinematics-based 3D Human-Object Interaction Reconstruction from Single View

Single Depth View Based Real-Time Reconstruction of Hand-Object Interactions

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

Physics-aware Hand-object Interaction Denoising

Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning

CPF: Learning a Contact Potential Field to Model the Hand-Object Interaction

HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video

Learning Explicit Contact for Implicit Reconstruction of Hand-held Objects from Monocular Images

Learning Hierarchical Control for Robust In-Hand Manipulation

InteractionFusion

Learning Human-to-Robot Handovers from Point Clouds.

Resolving hand‐object occlusion for mixed reality with joint deep learning and model optimization

Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives

VR-HandNet: A Visually and Physically Plausible Hand Manipulation System in Virtual Reality

Dynamics Learning with Object-Centric Interaction Networks for Robot Manipulation

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling