Abstract:Human motion capture technology, which leverages sensors to track the movement trajectories of key skeleton points, has been progressively transitioning from industrial applications to broader civilian applications in recent years. It finds extensive use in fields such as game development, digital human modeling, and sport science. However, the affordability of these sensors often compromises the accuracy of motion data. Low-cost motion capture methods often lead to errors in the captured motion data. We introduce a novel approach for human motion reconstruction and enhancement using spatio-temporal attention-based graph convolutional networks (ST-ATGCNs), which efficiently learn the human skeleton structure and the motion logic without requiring prior human kinematic knowledge. This method enables unsupervised motion data restoration and significantly reduces the costs associated with obtaining precise motion capture data. Our experiments, conducted on two extensive motion datasets and with real motion capture sensors such as the SONY mocopi, demonstrate the method's effectiveness in enhancing the quality of low-precision motion capture data. The experiments indicate the ST-ATGCN's potential to improve both the accessibility and accuracy of motion capture technology.

What problem does this paper attempt to address?

This paper is primarily dedicated to addressing the common issue of data accuracy in low-cost human motion capture technology. Specifically, the research team developed a new method called "Unconstrained Human Structure Learning" to enhance and restore human motion data through a Spatio-Temporal Attention Graph Convolutional Network (ST-ATGCN). ### Research Background and Issues - **Issues with Low-Cost Sensors**: Although consumer-grade human motion capture sensors have become increasingly popular due to their lower cost, these sensors often lack the accuracy of industrial-grade equipment. This leads to various errors in motion data, such as distortions and missing details. - **Sources of Data Errors**: The paper lists several factors that contribute to data errors, including inherent random noise in sensors, external environmental interference (such as changes in lighting and electromagnetic interference), algorithmic errors (especially cumulative errors in accelerometer and gyroscope data), sensor movement during the collection process, and data loss or inaccurate estimation caused by occlusion or misalignment of optical sensors. ### Solution The proposed solution in the paper is a new architecture based on Graph Convolutional Networks (GCN) — ST-ATGCN, which can effectively learn the human skeletal structure and significantly improve the quality of low-accuracy motion capture data without requiring prior knowledge of human kinematics. The key features of ST-ATGCN include: 1. **Spatio-Temporal Attention Mechanism**: To overcome the issues of information forgetting and low computational efficiency in traditional GCNs when processing long sequence data, the model introduces a Temporal Self-Attention module (TSA), which can capture dependencies between different time steps, achieving higher computational efficiency. 2. **Improved Information Propagation Efficiency**: To address the low information propagation efficiency of traditional GCNs when processing human skeletal graphs, the study designed a Spatial Self-Attention Graph Convolution module (SA-GC), which uses a learnable parameter matrix to replace the traditional adjacency matrix, thereby improving the efficiency of information exchange between distant nodes. 3. **Optimization of Dual-Stream Structure**: To solve the high computational cost and feature fusion difficulties faced by dual-stream structures during network training, ST-ATGCN processes the input sequence iteratively, outputting latent variables in the spatial and temporal dimensions respectively, and constructs the final latent space, thereby effectively fusing spatio-temporal information. ### Experimental Validation To validate the effectiveness of the proposed method, the researchers conducted experiments on two large public human motion datasets, NTU-RGB-D 60 and NTU-RGB-D 120. Additionally, they constructed partially erroneous human motion datasets, NTU-RGB-ER and MCP-ER, to evaluate the motion enhancement effect. The experimental results show that ST-ATGCN can effectively reconstruct and restore erroneous motion sequences. In summary, this study aims to improve the accuracy of consumer-grade human motion capture systems through a novel method, making such technology not only more affordable but also capable of providing high-quality motion data in various application scenarios.

Human Motion Enhancement and Restoration via Unconstrained Human Structure Learning

Dynamic Human Body Reconstruction and Motion Tracking with Low-Cost Depth Cameras

A Spatio-Temporal Transformer Network for Human Motion Prediction in Human-Robot Collaboration

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

Human Motion Capture Using Wireless Inertial Sensors

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Efficient motion capture data recovery via relationship-aggregated graph network and temporal pattern reasoning

Real-Time Human Motion Capture Based on Wearable Inertial Sensor Networks

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

Visualization of movements in sports training based on multimedia information processing technology

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

AMHGCN: Adaptive multi-level hypergraph convolution network for human motion prediction

Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Motion Capture for Sporting Events Based on Graph Convolutional Neural Networks and Single Target Pose Estimation Algorithms

Fast Human Motion reconstruction from sparse inertial measurement units considering the human shape

Multi-view Human Motion Capture with an Improved Deformation Skin Model

Special considerations in the pediatric use of radionuclides for kidney studies.

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

3D Human Pose Estimation with Spatio-Temporal Criss-Cross Attention

An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Motion Capture Research: 3D Human Pose Recovery Based on RGB Video Sequences