Abstract:3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-being. The advent of deep learning significantly advances the performance of 3D pose estimation by incorporating temporal information for predicting the spatial positions of human joints. However, traditional methods often fall short as they primarily focus on the spatial coordinates of joints and overlook the orientation and rotation of the connecting bones, which are crucial for a comprehensive understanding of human pose in 3D space. To address these limitations, we introduce Quater-GCN (Q-GCN), a directed graph convolutional network tailored to enhance pose estimation by orientation. Q-GCN excels by not only capturing the spatial dependencies among node joints through their coordinates but also integrating the dynamic context of bone rotations in 2D space. This approach enables a more sophisticated representation of human poses by also regressing the orientation of each bone in 3D space, moving beyond mere coordinate prediction. Furthermore, we complement our model with a semi-supervised training strategy that leverages unlabeled data, addressing the challenge of limited orientation ground truth data. Through comprehensive evaluations, Q-GCN has demonstrated outstanding performance against current state-of-the-art methods.

GCVNet: Geometry Constrained Voting Network to Estimate 3D Pose for Fine-Grained Object Categories

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

GPV-Pose: Category-level Object Pose Estimation Via Geometry-guided Point-wise Voting

PVA-GCN: point-voxel absorbing graph convolutional network for 3D human pose estimation from monocular video

3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Category-level Pose Estimation and Iterative Refinement for Monocular RGB-D Image

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

NVR-Net: Normal Vector Guided Regression Network for Disentangled 6D Pose Estimation

GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video

HPGCN: Hierarchical Poselet-Guided Graph Convolutional Network for 3D Pose Estimation

Locally Connected Network for Monocular 3D Human Pose Estimation

Category-Level Object Pose Estimation with Statistic Attention

Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

Prior Geometry Guided Direct Regression Network for Monocular 6D Object Pose Estimation

3DPVNet: Patch-level 3D Hough Voting Network for 6D Pose Estimation

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

A novel GCN-based point cloud classification model robust to pose variances

P$^2$GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation