Human Motion Enhancement and Restoration via Unconstrained Human Structure Learning

Tianjia He,Tianyuan Yang,Shin'ichi Konomi
DOI: https://doi.org/10.3390/s24103123
IF: 3.9
2024-05-15
Sensors
Abstract:Human motion capture technology, which leverages sensors to track the movement trajectories of key skeleton points, has been progressively transitioning from industrial applications to broader civilian applications in recent years. It finds extensive use in fields such as game development, digital human modeling, and sport science. However, the affordability of these sensors often compromises the accuracy of motion data. Low-cost motion capture methods often lead to errors in the captured motion data. We introduce a novel approach for human motion reconstruction and enhancement using spatio-temporal attention-based graph convolutional networks (ST-ATGCNs), which efficiently learn the human skeleton structure and the motion logic without requiring prior human kinematic knowledge. This method enables unsupervised motion data restoration and significantly reduces the costs associated with obtaining precise motion capture data. Our experiments, conducted on two extensive motion datasets and with real motion capture sensors such as the SONY mocopi, demonstrate the method's effectiveness in enhancing the quality of low-precision motion capture data. The experiments indicate the ST-ATGCN's potential to improve both the accessibility and accuracy of motion capture technology.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
This paper is primarily dedicated to addressing the common issue of data accuracy in low-cost human motion capture technology. Specifically, the research team developed a new method called "Unconstrained Human Structure Learning" to enhance and restore human motion data through a Spatio-Temporal Attention Graph Convolutional Network (ST-ATGCN). ### Research Background and Issues - **Issues with Low-Cost Sensors**: Although consumer-grade human motion capture sensors have become increasingly popular due to their lower cost, these sensors often lack the accuracy of industrial-grade equipment. This leads to various errors in motion data, such as distortions and missing details. - **Sources of Data Errors**: The paper lists several factors that contribute to data errors, including inherent random noise in sensors, external environmental interference (such as changes in lighting and electromagnetic interference), algorithmic errors (especially cumulative errors in accelerometer and gyroscope data), sensor movement during the collection process, and data loss or inaccurate estimation caused by occlusion or misalignment of optical sensors. ### Solution The proposed solution in the paper is a new architecture based on Graph Convolutional Networks (GCN) — ST-ATGCN, which can effectively learn the human skeletal structure and significantly improve the quality of low-accuracy motion capture data without requiring prior knowledge of human kinematics. The key features of ST-ATGCN include: 1. **Spatio-Temporal Attention Mechanism**: To overcome the issues of information forgetting and low computational efficiency in traditional GCNs when processing long sequence data, the model introduces a Temporal Self-Attention module (TSA), which can capture dependencies between different time steps, achieving higher computational efficiency. 2. **Improved Information Propagation Efficiency**: To address the low information propagation efficiency of traditional GCNs when processing human skeletal graphs, the study designed a Spatial Self-Attention Graph Convolution module (SA-GC), which uses a learnable parameter matrix to replace the traditional adjacency matrix, thereby improving the efficiency of information exchange between distant nodes. 3. **Optimization of Dual-Stream Structure**: To solve the high computational cost and feature fusion difficulties faced by dual-stream structures during network training, ST-ATGCN processes the input sequence iteratively, outputting latent variables in the spatial and temporal dimensions respectively, and constructs the final latent space, thereby effectively fusing spatio-temporal information. ### Experimental Validation To validate the effectiveness of the proposed method, the researchers conducted experiments on two large public human motion datasets, NTU-RGB-D 60 and NTU-RGB-D 120. Additionally, they constructed partially erroneous human motion datasets, NTU-RGB-ER and MCP-ER, to evaluate the motion enhancement effect. The experimental results show that ST-ATGCN can effectively reconstruct and restore erroneous motion sequences. In summary, this study aims to improve the accuracy of consumer-grade human motion capture systems through a novel method, making such technology not only more affordable but also capable of providing high-quality motion data in various application scenarios.