Abstract:Human motion capture from monocular videos has made significant progress in recent years. However, modern approaches often produce temporal artifacts, e.g. in form of jittery motion and struggle to achieve smooth and physically plausible motions. Explicitly integrating physics, in form of internal forces and exterior torques, helps alleviating these artifacts. Current state-of-the-art approaches make use of an automatic PD controller to predict torques and reaction forces in order to re-simulate the input kinematics, i.e. the joint angles of a predefined skeleton. However, due to imperfect physical models, these methods often require simplifying assumptions and extensive preprocessing of the input kinematics to achieve good performance. To this end, we propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting, inspired by a neural Kalman-filtering approach. We develop a control loop as a meta-PD controller to predict internal joint torques and external reaction forces, followed by a physics-based motion simulation. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion, resulting in an optimal-state dynamics prediction. We show that this filtering step is crucial to provide an online supervision that helps balancing the shortcoming of the respective input motions, thus being important for not only capturing accurate global motion trajectories but also producing physically plausible human poses. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics, compared to state of the art. The code is available on <a class="link-external link-https" href="https://github.com/cuongle1206/OSDCap" rel="external noopener nofollow">this https URL</a>

Contact and Human Dynamics from Monocular Video

D&D: Learning Human Dynamics from Dynamic Camera

Physics-based Human Motion Estimation and Synthesis from Videos

Efficient Human Motion Reconstruction from Monocular Videos with Physical Consistency Loss.

Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos

3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data

Deep Physics-aware Inference of Cloth Deformation for Monocular Human Performance Capture

Contact-aware Human Motion Forecasting

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

Physics-Guided Human Motion Capture with Pose Probability Modeling

Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Kinematics-based 3D Human-Object Interaction Reconstruction from Single View

Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation

Contact-Aware Retargeting of Skinned Motion

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs