Abstract:We present HOIMotion - a novel approach for human motion forecasting during human-object interactions that integrates information about past body poses and egocentric 3D object bounding boxes. Human motion forecasting is important in many augmented reality applications but most existing methods have only used past body poses to predict future motion. HOIMotion first uses an encoder-residual graph convolutional network (GCN) and multi-layer perceptrons to extract features from body poses and egocentric 3D object bounding boxes, respectively. Our method then fuses pose and object features into a novel pose-object graph and uses a residual-decoder GCN to forecast future body motion. We extensively evaluate our method on the Aria digital twin (ADT) and MoGaze datasets and show that HOIMotion consistently outperforms state-of-the-art methods by a large margin of up to 8.7% on ADT and 7.2% on MoGaze in terms of mean per joint position error. Complementing these evaluations, we report a human study (N=20) that shows that the improvements achieved by our method result in forecasted poses being perceived as both more precise and more realistic than those of existing methods. Taken together, these results reveal the significant information content available in egocentric 3D object bounding boxes for human motion forecasting and the effectiveness of our method in exploiting this information.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the human motion prediction problem during the process of human - object interaction. Specifically, traditional motion prediction methods mainly rely on past body postures to predict future body postures, while ignoring the influence of objects in the environment on human motion. However, in daily life, human behaviors and actions are usually closely related to the surrounding objects, especially when conducting human - machine interaction activities. Therefore, this research proposes a new method - HOIMotion, which not only considers past body postures, but also introduces 3D object bounding box information from an egocentric perspective to more accurately predict future human motion. The main contributions of HOIMotion are as follows: 1. **Prove the effectiveness of 3D object bounding box information from an egocentric perspective in motion prediction**, providing a new perspective for this challenging task. 2. **Propose an encoder - residual - decoder architecture based on graph convolutional network (GCN)**, which can effectively combine historical body postures and 3D object bounding box information from an egocentric perspective to predict future human motion. 3. **Verify the effectiveness of the method through extensive experiments**, conduct motion prediction experiments with different future time ranges on two public datasets, and further prove through user studies that its prediction results are superior to existing methods in terms of accuracy and authenticity. In conclusion, this paper aims to improve the accuracy of human motion prediction by integrating information of human postures and environmental objects. Especially in virtual reality (VR) and augmented reality (AR) applications, this improvement is of great significance for enhancing user experience.

HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes

GazeMotion: Gaze-guided Human Motion Forecasting

Forecasting Distillation: Enhancing 3D Human Motion Prediction with Guidance Regularization

Motion Forecasting Network (mofcnet): IMU-Based Human Motion Forecasting for Hip Assistive Exoskeleton

GIMO: Gaze-Informed Human Motion Prediction in Context.

Expressive Forecasting of 3D Whole-body Human Motions

Forecasting of 3D Whole-body Human Poses with Grasping Objects

Contact-aware Human Motion Forecasting

Scene-aware Human Motion Forecasting via Mutual Distance Prediction

Multimodal Sense-Informed Prediction of 3D Human Motions

Towards Accurate 3D Human Motion Prediction from Incomplete Observations

Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction

Action-guided 3D Human Motion Prediction.

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction

EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos

GGTr: An Innovative Framework for Accurate and Realistic Human Motion Prediction

Motion Prediction Using Trajectory Cues

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Simple Baseline for Single Human Motion Forecasting

3D Human Motion Prediction: A Survey