Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

Xingyu Song,Zhan Li,Shi Chen,Kazuyuki Demachi
2024-10-11
Abstract:3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-being. The advent of deep learning significantly advances the performance of 3D pose estimation by incorporating temporal information for predicting the spatial positions of human joints. However, traditional methods often fall short as they primarily focus on the spatial coordinates of joints and overlook the orientation and rotation of the connecting bones, which are crucial for a comprehensive understanding of human pose in 3D space. To address these limitations, we introduce Quater-GCN (Q-GCN), a directed graph convolutional network tailored to enhance pose estimation by orientation. Q-GCN excels by not only capturing the spatial dependencies among node joints through their coordinates but also integrating the dynamic context of bone rotations in 2D space. This approach enables a more sophisticated representation of human poses by also regressing the orientation of each bone in 3D space, moving beyond mere coordinate prediction. Furthermore, we complement our model with a semi-supervised training strategy that leverages unlabeled data, addressing the challenge of limited orientation ground truth data. Through comprehensive evaluations, Q-GCN has demonstrated outstanding performance against current state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of 3D human pose estimation, particularly the challenges under monocular views. Specifically, the paper proposes Quater-GCN (Q-GCN), a directional graph convolutional network designed to enhance the accuracy of 3D pose estimation by integrating the directional information of the skeleton. Traditional methods often focus only on the spatial coordinates of joints, neglecting the directional and rotational information of connecting bones, which can lead to inaccurate pose estimation in complex scenarios. The main contributions of the paper include: 1. **Proposing a novel 2D-to-3D pose lifting method**: By integrating the directional information of skeletal joints, the model performance is significantly improved. 2. **Developing a semi-supervised training strategy**: Cleverly utilizing unlabeled data to overcome the scarcity of directional training data. 3. **Demonstrating improvements over existing state-of-the-art methods**: Surpassing existing methods in terms of 3D pose estimation accuracy. Through these technical means, Q-GCN is not only able to capture the positional dependencies of node joints but also integrate the dynamic context of skeletal rotations in 2D space, thereby constructing a more complex pose representation. Additionally, the paper provides a detailed description of the overall architecture of Q-GCN and its semi-supervised training strategy, and validates its superior performance through experiments.