Abstract:Estimating full-body motion using the tracking signals of head and hands from VR devices holds great potential for various applications. However, the sparsity and unique distribution of observations present a significant challenge, resulting in an ill-posed problem with multiple feasible solutions (i.e., hypotheses). This amplifies uncertainty and ambiguity in full-body motion estimation, especially for the lower-body joints. Therefore, we propose a new method, EnvPoser, that employs a two-stage framework to perform full-body motion estimation using sparse tracking signals and pre-scanned environment from VR devices. EnvPoser models the multi-hypothesis nature of human motion through an uncertainty-aware estimation module in the first stage. In the second stage, we refine these multi-hypothesis estimates by integrating semantic and geometric environmental constraints, ensuring that the final motion estimation aligns realistically with both the environmental context and physical interactions. Qualitative and quantitative experiments on two public datasets demonstrate that our method achieves state-of-the-art performance, highlighting significant improvements in human motion estimation within motion-environment interaction scenarios.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: the multi - solution and uncertainty problems in estimating whole - body motion from sparse tracking signals (such as head and hand signals provided by VR devices), especially in the estimation of lower - limb joints. Specifically: 1. **Challenges Brought by Sparse Observation Data**: - VR devices (such as PICO and Quest) usually only provide sparse tracking signals of the head and hands. - These sparse input signals may lead to multiple reasonable motion hypotheses corresponding to the same input, making motion estimation uncertain and ambiguous. 2. **Utilization of Environmental Information**: - Human motion is highly correlated with the surrounding environment. Existing methods often simplify the human - environment interaction and ignore the complex interaction details. - The paper proposes to reduce the uncertainty in the estimation by combining pre - scanned environmental information and guide the motion estimation results to be more in line with the actual scene. 3. **Multi - Hypothesis Motion Estimation**: - The multi - solution problem caused by sparse observation data requires a method that can explicitly model multiple hypotheses. - By introducing an uncertainty estimation module, this multi - solution problem can be better handled, thereby improving the accuracy of motion estimation. To solve these problems, the paper proposes a new framework named EnvPoser, which consists of two stages: - **First Stage**: Use an uncertainty - aware initial motion estimation module to explicitly model multi - hypothesis motion estimation. - **Second Stage**: Refine the multi - hypothesis estimation by combining semantic and geometric environmental constraints to ensure that the final motion estimation result is in line with both the environmental context and physical interaction. Through this method, EnvPoser can more accurately estimate whole - body motion based on sparse observation data and perform well in scenarios involving environmental interaction. ### Formula Summary 1. **Loss Function**: - Loss function in the initial stage: \[ L_{S - I}=\lambda_M L_M+\lambda_\delta L_\delta \] where: \[ L_M = \|\hat{\theta}-\theta\|_2^2 \] \[ L_\delta=\|\hat{\theta}-\theta\|_{\delta}^2+\log(\|\delta\|_2) \] 2. **Loss Function of the Environment - Aware Refinement Module**: - The final second - stage loss function: \[ L_{S - II}=L_{S - I}+L_M'+ \lambda_1 L_{posi}+\lambda_2 L_{hAL}+\lambda_3 L_{fc}+\lambda_4 L_{contact}+\lambda_5 L_{gfh}+\lambda_6 L_{gp}+\lambda_7 L_{coap} \] where: \[ L_M'=\|\hat{\theta}_{RM}-\theta\|_2^2 \] \[ L_{posi}=\|\hat{P}_{RM}-P\|_2^2 \] \[ L_{hAL}=\|\hat{P}_{hand, RM}-P_{hand}\|_1 \] \[ L_{fc}=\|(\hat{P}_{feet, RM}-P_{feet})\cdot C\|_1 \] \[ L_{gfh}=\|\hat{z}_{feet, PRM}-z_{ground}\|_1 \] \[ L_{gp}=\|(\hat{

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

Motion Imitation of a Humanoid Robot Via Pose Estimation

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time

Embodied Scene-aware Human Pose Estimation

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Leveraging Two Kinect Sensors for Accurate Full-Body Motion Capture

Full-body Motion Capture for Multiple Closely Interacting Persons.

Human Motion Tracking with Less Constraint of Initial Posture from a Single RGB-D Sensor

Towards Accurate Markerless Human Shape and Pose Estimation over Time

Real-time Physics-based Motion Capture with Sparse Sensors