Abstract:Abstract Inertial Measurement Unit-based methods have great potential in capturing motion in large-scale and complex environments with many people. Sparse Inertial Measurement Unit-based methods have more research value due to their simplicity and flexibility. However, improving the computational efficiency and reducing latency in such methods are challenging. In this paper, we propose Fast Inertial Poser, which is a full body motion estimation deep neural network based on 6 inertial measurement units considering body parameters. We design a network architecture based on recurrent neural networks according to the kinematics tree. This method introduces human body shape information by the causality of observations and eliminates the dependence on future frames. During the estimation of joint positions, the upper body and lower body are estimated using separate network modules independently. Then the joint rotation is obtained through a well-designed single-frame kinematics inverse solver. Experiments show that the method can greatly improve the inference speed and reduce the latency while ensuring the reconstruction accuracy compared with previous methods. Fast Inertial Poser runs at 65 fps with 15 ms latency on an embedded computer, demonstrating the efficiency of the model.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the computational efficiency and reduce the latency of human motion reconstruction based on sparse inertial measurement units (IMUs), while ensuring the reconstruction accuracy. Specifically: 1. **Improve computational efficiency and reduce latency**: Although existing methods based on sparse IMUs are simple and flexible, they pose challenges in terms of computational efficiency and latency. The paper proposes a deep neural network method named Fast Inertial Poser (FIP), aiming to significantly improve the inference speed and reduce latency by optimizing the network architecture and introducing human body shape information. 2. **Consider human body shape information**: Traditional methods usually do not consider human body shape parameters, which may lead to deviations in the reconstruction results. FIP reduces unnecessary calculations and improves the expressive ability of the model by introducing human body shape information (such as height, arm length, leg length, etc.). 3. **Achieve real - time processing**: In order to enable this method to run in real - time on embedded devices (such as AR/VR headsets), the paper designs an efficient network architecture and an inverse kinematics solver, making FIP able to run at a speed of 65 frames per second on an embedded computer with a latency of only 15 milliseconds. ### Specific problem description - **Computational efficiency and latency problems**: Existing methods perform poorly in terms of computational efficiency and latency, especially on embedded devices. - **Lack of human body shape information**: Traditional methods ignore human body shape parameters, resulting in possible deviations in the reconstruction results. - **Requirement for real - time processing**: In order to meet the needs of multi - person motion capture in large - scale and complex environments, a real - time method that can run efficiently on embedded devices is required. ### Solutions - **Introduce human body shape information**: By introducing human body shape parameters (such as height, arm length, leg length, etc.), unnecessary calculations are reduced and the expressive ability of the model is improved. - **Optimize network architecture**: A network architecture based on recurrent neural networks (RNNs) is designed and optimized according to the human motion tree structure. - **Efficient inverse kinematics solver**: A differentiable inverse kinematics solver based on the SMPL model is designed to solve the joint rotation problem. - **Independent modular design**: The joint position estimations of the upper and lower body are divided into independent network modules, further improving the computational efficiency. ### Experimental results The experimental results show that FIP is superior to existing methods in terms of angular error, position error, mesh error, etc., and has higher running efficiency and lower latency on embedded devices. Specific indicators are as follows: | Method | SIP (deg) | Ang (deg) | Aang (deg) | Pos (cm) | Mesh (cm) | Jitter (km/s³) | TPF (ms) | Latency (ms) | FPS | | ------ | --------- | --------- | ---------- | -------- | --------- | ------------- | -------- | ------------ | ----- | | DIP | 17.85 | 15.47 | 16.05 | 6.65 | 9.46 | 2.77 | -- | -- | -- | | Transpose | 16.69 | 11.30 | 8.86 | 5.80 | 7.34 | 0.61 | 10.6 | 120 | 27 | | PIP | 15.02 | 10.54 | 8.73 | 4.80 | 5.95 | 0.27 | 13.3 | 76 | 13 | | TIP | 15.40 | 10.78 | 8.95 | 5

Fast Human Motion reconstruction from sparse inertial measurement units considering the human shape

Motion Imitation of a Humanoid Robot Via Pose Estimation

Reconstructing 3D human pose and shape from a single image and sparse IMUs

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors

Physical Non-inertial Poser (PNP): Modeling Non-inertial Effects in Sparse-inertial Human Motion Capture

Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation

Fusion Poser: 3D Human Pose Estimation Using Sparse IMUs and Head Trackers in Real Time

Real-time Physics-based Motion Capture with Sparse Sensors

RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

Parametric Human Body Reconstruction Based on Sparse Key Points.

A Scalable and Wearable Self-Sensing IMU Sensor Network for Personalized Human Motion and Deformation Capture

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Full-body Motion Capture for Multiple Closely Interacting Persons.

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs

Fast IMU-based Dual Estimation of Human Motion and Kinematic Parameters via Progressive In-Network Computing

Motion Capture Research: 3D Human Pose Recovery Based on RGB Video Sequences

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

3D Human Pose Estimation with Single Image and Inertial Measurement Unit (IMU) Sequence