HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes

Zhiming Hu,Zheming Yin,Daniel Haeufle,Syn Schmitt,Andreas Bulling
2024-07-03
Abstract:We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the human motion prediction problem during the process of human - object interaction. Specifically, traditional motion prediction methods mainly rely on past body postures to predict future body postures, while ignoring the influence of objects in the environment on human motion. However, in daily life, human behaviors and actions are usually closely related to the surrounding objects, especially when conducting human - machine interaction activities. Therefore, this research proposes a new method - HOIMotion, which not only considers past body postures, but also introduces 3D object bounding box information from an egocentric perspective to more accurately predict future human motion. The main contributions of HOIMotion are as follows: 1. **Prove the effectiveness of 3D object bounding box information from an egocentric perspective in motion prediction**, providing a new perspective for this challenging task. 2. **Propose an encoder - residual - decoder architecture based on graph convolutional network (GCN)**, which can effectively combine historical body postures and 3D object bounding box information from an egocentric perspective to predict future human motion. 3. **Verify the effectiveness of the method through extensive experiments**, conduct motion prediction experiments with different future time ranges on two public datasets, and further prove through user studies that its prediction results are superior to existing methods in terms of accuracy and authenticity. In conclusion, this paper aims to improve the accuracy of human motion prediction by integrating information of human postures and environmental objects. Especially in virtual reality (VR) and augmented reality (AR) applications, this improvement is of great significance for enhancing user experience.