Abstract:Existing inertial motion capture techniques use the human root coordinate frame to estimate local poses and treat it as an inertial frame by default. We argue that when the root has linear acceleration or rotation, the root frame should be considered non-inertial theoretically. In this paper, we model the fictitious forces that are non-neglectable in a non-inertial frame by an auto-regressive estimator delicately designed following physics. With the fictitious forces, the force-related IMU measurement (accelerations) can be correctly compensated in the non-inertial frame and thus Newton's laws of motion are satisfied. In this case, the relationship between the accelerations and body motions is deterministic and learnable, and we train a neural network to model it for better motion capture. Furthermore, to train the neural network with synthetic data, we develop an IMU synthesis by simulation strategy to better model the noise model of IMU hardware and allow parameter tuning to fit different hardware. This strategy not only establishes the network training with synthetic data but also enables calibration error modeling to handle bad motion capture calibration, increasing the robustness of the system. Code is available at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: Existing human motion capture technologies based on inertial measurement units (IMUs) have physical flaws when dealing with non - inertial effects in the human root coordinate frame, which leads to the failure of reconstructing certain complex or ambiguous postures. Specifically, when the human root coordinate frame experiences linear acceleration or rotation, it is actually a non - inertial reference frame, while existing methods default to treating it as an inertial reference frame, ignoring the resulting fictitious forces (such as centrifugal and Coriolis forces). This ignorance makes the projection from the inertial world coordinate frame to the non - inertial root coordinate frame inaccurate, resulting in a mismatch between accelerometer readings and actual local motion. To solve this problem, the authors propose the Physical Non - inertial Poser (PNP) model, which corrects the acceleration readings of IMUs by introducing an autoregressive neural estimator to explicitly model fictitious forces. This can correctly compensate for acceleration measurements in the non - inertial root coordinate frame, satisfy Newton's laws of motion, and train a neural network to learn the relationship between acceleration and human motion to improve the quality of motion capture. In addition, in order to make better use of synthetic data to train the neural network, the authors also propose an IMU signal synthesis method that simulates sensor noise and calibration errors to increase the robustness of the system. ### Main contributions: 1. **Physical Non - inertial Poser (PNP)**: Enhance the ability to perform real - time human motion estimation from sparse inertial measurement units (IMUs), especially in acceleration - dominated actions (such as raising hands or lifting legs). 2. **Fictitious force modeling**: Learn physically correct fictitious forces caused by the non - inertial human root coordinate frame through a neural autoregressive estimator. 3. **IMU measurement synthesis**: Improve model training by generating more realistic IMU signals through simulation, considering sensor noise and calibration errors. ### Core of the solution: - **Fictitious force calculation**: In the non - inertial root coordinate frame, fictitious forces can affect inertial measurements and must be considered when estimating the root - relative human pose. The expression for the fictitious force is: \[ \mathbf{f}_{\text{fic}}=-m(\mathbf{a}_R + [\boldsymbol{\omega}_R]^2\times\mathbf{p}_{RL}+2[\boldsymbol{\omega}_R]\times\dot{\mathbf{p}}_{RL}+[\dot{\boldsymbol{\omega}}_R]\times\mathbf{p}_{RL}) \] where \(m\) is the mass, \([·]^\times\) represents the skew - symmetric matrix of the vector cross - product, \(\mathbf{a}_R\) and \(\boldsymbol{\omega}_R\) are the acceleration and angular velocity of the root joint respectively, and \(\mathbf{p}_{RL}\) and \(\dot{\mathbf{p}}_{RL}\) are the position and velocity of the leaf node relative to the root respectively. - **Autoregressive neural network**: Used to estimate the fictitious acceleration. The inputs include the dynamic parameters of the root joint (acceleration, angular velocity, angular acceleration) and the dynamic parameters of the leaf node (position, velocity, acceleration, orientation), and the output is the fictitious acceleration \(\mathbf{a}_{\text{fic}}=\mathbf{f}_{\text{fic}}/m\). - **IMU signal synthesis**: Generate more realistic IMU measurement data by simulating 6 - degree - of - freedom trajectories, adding sensor noise and fusing signals, while considering calibration errors in the T - pose calibration process. Through these improvements, the PNP model can more accurately handle non - inertial effects and improve the motion capture accuracy and robustness in a sparse IMU configuration.

Physical Non-inertial Poser (PNP): Modeling Non-inertial Effects in Sparse-inertial Human Motion Capture

Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors

Fast Human Motion reconstruction from sparse inertial measurement units considering the human shape

Human Motion Capture Using Wireless Inertial Sensors

Real-time Physics-based Motion Capture with Sparse Sensors

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Real-Time Human Motion Capture Based on Wearable Inertial Sensor Networks

Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture

[Infrared radiative characteristic of Ho3+ in heavy metal tellurite glasses].

A Scalable and Wearable Self-Sensing IMU Sensor Network for Personalized Human Motion and Deformation Capture

Predicting Missing Markers In Human Motion Capture Using L1-Sparse Representation

Physics-Guided Human Motion Capture with Pose Probability Modeling

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

3D Deformation Capture Via a Configurable Self-Sensing IMU Sensor Network

Reconstructing 3D human pose and shape from a single image and sparse IMUs

Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation

Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion