Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Haoyu Hu,Xinyu Yi,Zhe Cao,Jun-Hai Yong,Feng Xu
2024-05-04
Abstract:Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The paper attempts to address the problem of accurately reconstructing hand-object interaction movements. Specifically, the authors propose a deep reinforcement learning method called HOIC (Hand-Object Interaction Controller), which aims to reconstruct the interaction actions between the hand and the object in real-time using a single RGBD camera, ensuring that these actions are physically reasonable and accurate. ### The main contributions of the paper include: 1. **Introduction of an object compensation control mechanism**: To overcome the differences between contact representation in the physical simulator and the hand-object contact mechanism in the real world, the HOIC framework generates not only hand control signals but also additional forces and torques that act directly on the object, significantly improving the system's stability. These compensatory forces and torques can be interpreted as part of the surface contact model, thus avoiding the complex modeling of soft tissue surface contact. 2. **Improved accuracy and physical correctness of interaction action reconstruction**: By applying compensatory forces and torques, HOIC seamlessly upgrades the simple point contact model to a surface contact model that better conforms to physical principles, further enhancing the accuracy and physical reasonableness of the reconstructed actions. 3. **No need for heuristic physical rules**: Experiments show that even without involving any heuristic physical rules, HOIC can successfully introduce physical principles into the complex action reconstruction of hand-object interactions, which are usually difficult to mimic using deep reinforcement learning. ### Method Overview: - **State Definition**: The state includes the hand pose of the current frame, hand velocity, object pose, object velocity, and the kinematic hand pose and object pose of the next few frames. - **Action Definition**: The action includes hand joint torques, compensatory forces, and compensatory torques, which are input into the physical simulator to generate physically reasonable actions. - **Reward Definition**: The reward consists of imitation rewards and physical rewards. Imitation rewards encourage the actions generated by the policy to be similar to the reference actions; physical rewards ensure that the use of compensatory forces and torques complies with physical laws. ### Experimental Results: - **Comparison with existing methods**: Experimental results show that HOIC outperforms purely vision-based methods and other methods that introduce heuristic physical rules in terms of training speed and imitation quality. - **System design evaluation**: The authors also evaluated key design aspects of the system, such as the impact of object compensation control on the training process and the effect of different numbers of future frames on policy performance. Overall, HOIC effectively addresses the real-time, high-precision reconstruction of hand-object interaction actions by introducing an object compensation control mechanism, providing new solutions for fields such as virtual reality, human-computer interaction, and robotic learning.