Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

Xingyu Song,Zhan Li,Shi Chen,Kazuyuki Demachi

2024-10-11

Abstract:3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-being. The advent of deep learning significantly advances the performance of 3D pose estimation by incorporating temporal information for predicting the spatial positions of human joints. However, traditional methods often fall short as they primarily focus on the spatial coordinates of joints and overlook the orientation and rotation of the connecting bones, which are crucial for a comprehensive understanding of human pose in 3D space. To address these limitations, we introduce Quater-GCN (Q-GCN), a directed graph convolutional network tailored to enhance pose estimation by orientation. Q-GCN excels by not only capturing the spatial dependencies among node joints through their coordinates but also integrating the dynamic context of bone rotations in 2D space. This approach enables a more sophisticated representation of human poses by also regressing the orientation of each bone in 3D space, moving beyond mere coordinate prediction. Furthermore, we complement our model with a semi-supervised training strategy that leverages unlabeled data, addressing the challenge of limited orientation ground truth data. Through comprehensive evaluations, Q-GCN has demonstrated outstanding performance against current state-of-the-art methods.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of 3D human pose estimation, particularly the challenges under monocular views. Specifically, the paper proposes Quater-GCN (Q-GCN), a directional graph convolutional network designed to enhance the accuracy of 3D pose estimation by integrating the directional information of the skeleton. Traditional methods often focus only on the spatial coordinates of joints, neglecting the directional and rotational information of connecting bones, which can lead to inaccurate pose estimation in complex scenarios. The main contributions of the paper include: 1. **Proposing a novel 2D-to-3D pose lifting method**: By integrating the directional information of skeletal joints, the model performance is significantly improved. 2. **Developing a semi-supervised training strategy**: Cleverly utilizing unlabeled data to overcome the scarcity of directional training data. 3. **Demonstrating improvements over existing state-of-the-art methods**: Surpassing existing methods in terms of 3D pose estimation accuracy. Through these technical means, Q-GCN is not only able to capture the positional dependencies of node joints but also integrate the dynamic context of skeletal rotations in 2D space, thereby constructing a more complex pose representation. Additionally, the paper provides a detailed description of the overall architecture of Q-GCN and its semi-supervised training strategy, and validates its superior performance through experiments.

Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video

3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

HPGCN: Hierarchical Poselet-Guided Graph Convolutional Network for 3D Pose Estimation

An Improved 3D Human Pose Estimation Model Based on Temporal Convolution with Gaussian Error Linear Units

Optimizing Network Structure for 3D Human Pose Estimation.

PVA-GCN: point-voxel absorbing graph convolutional network for 3D human pose estimation from monocular video

SPGformer: Serial-Parallel Hybrid GCN-Transformer with Graph-Oriented Encoder for 2D-to-3d Human Pose Estimation

Locally Connected Network for Monocular 3D Human Pose Estimation

SPGformer: Serial–Parallel Hybrid GCN-Transformer With Graph-Oriented Encoder for 2-D-to-3-D Human Pose Estimation

3D Human Pose Estimation Via Graph Extended Spatio-Temporal Convolutional Network

MSMB-GCN: Multi-scale Multi-branch Fusion Graph Convolutional Networks for 3D Human Pose Estimation

3D Human Pose Estimation Using Improved Semantic Graph Convolutional Based on Fusing Non-local Neural Network and Multi-Head Attention

A residual semantic graph convolutional network with high-resolution representation for 3D human pose estimation in a virtual fashion show

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Semi-Dynamic Hypergraph Neural Network for 3D Pose Estimation

Hierarchical Graph Networks for 3D Human Pose Estimation

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation

Structure-aware human pose estimation with graph convolutional networks