Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging

Rayan Armani,Changlin Qian,Jiaxi Jiang,Christian Holz
2024-04-30
Abstract:While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s^3$ (a reduction of $97\%$).
Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics,Signal Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively reduce the drift and jitter problems in inertial measurement when using sparse inertial sensors for human motion capture and tracking. Specifically, the author proposes a new 3D full - body pose estimation method - Ultra Inertial Poser (UIP). This method estimates the distances between sensors by introducing Ultra - Wideband (UWB) ranging technology and fuses these distances with the 3D state information obtained from the Inertial Measurement Unit (IMU) to improve the stability and accuracy of pose estimation. ### Main Problems and Solutions 1. **Limitations of Inertial Sensors**: - Inertial sensors (IMU) are prone to drift and jitter during long - term tracking, which limits their application in accurate human pose estimation. - To overcome this problem, the author introduces UWB ranging technology to constrain and correct errors in inertial measurement by estimating the relative distances between sensors. 2. **Simplification of Sensor Configuration**: - Existing high - precision motion - capture systems usually require a large number of sensors to cover the whole body, which is not only costly but also inconvenient to use. - The UIP method can achieve high - quality full - body pose estimation using only 6 sparsely - distributed IMU+UWB sensor nodes, greatly reducing the complexity and cost of the system. 3. **Data Fusion and Model Training**: - The author designs a model based on Graph Convolutional Network (GCN), which can effectively fuse IMU and UWB data, thereby improving the accuracy of pose estimation. - To train the model, the author uses the AMASS dataset to synthesize IMU signals and distance estimates and adds a verified noise model to enhance the realism of the data. ### Technical Details 1. **Sensor Hardware**: - Six wireless prototype sensors are designed. Each sensor contains a 6 - degree - of - freedom IMU (LSM6DS) and a UWB radio (DWM1000), integrated in a 35×35 mm package. - The sensors transmit data to the host computer via Bluetooth Low Energy (BLE), and the host is responsible for synchronizing and processing the data. 2. **Data Processing**: - The acceleration and angular velocity signals are obtained from the IMU, and the gravity - compensated acceleration and absolute orientation are estimated using the VQF filter. - The distances between UWB sensors are obtained using the Two - Way Ranging (TWR) protocol, and these distances are filtered and corrected by the Extended Kalman Filter (EKF). 3. **Pose Estimation Model**: - The LSTM network and the Distance - Attention Graph Convolutional Network (DA - GCN) are used to capture temporal and spatial information respectively. - The two estimation results are fused by linear interpolation to form the final sensor position estimate. - A dynamics optimizer is used to further estimate the local orientation, local velocity and foot - contact situation, and finally output the SMPL pose parameters and global translation. ### Experimental Results - **Performance Improvement**: - The experimental results show that the UIP method is significantly superior to existing methods such as TIP and PIP in terms of position error and jitter. Specifically, the position error is reduced from 13.62 cm to 10.65 cm (a 22% improvement), and the jitter is reduced from 1.56 km/s³ to 0.055 km/s³ (a 97% improvement). - **Dataset Contribution**: - The author also contributes a new motion dataset UIP - DB, which contains 200 minutes of synchronized sensor data of 10 participants performing 25 different actions, providing a valuable resource for subsequent research. In conclusion, this paper successfully solves the drift and jitter problems of sparse inertial sensors in human motion capture by introducing UWB ranging technology and advanced data fusion methods, and achieves high - precision full - body pose estimation.