Abstract:Estimating 3d human pose from monocular images is a challenging problem due to the variety and complexity of human poses and the inherent ambiguity in recovering depth from the single view. Recent deep learning based methods show promising results by using supervised learning on 3d pose annotated datasets. However, the lack of large-scale 3d annotated training data captured under in-the-wild settings makes the 3d pose estimation difficult for in-the-wild poses. Few approaches have utilized training images from both 3d and 2d pose datasets in a weakly-supervised manner for learning 3d poses in unconstrained settings. In this paper, we propose a method which can effectively predict 3d human pose from 2d pose using a deep neural network trained in a weakly-supervised manner on a combination of ground-truth 3d pose and ground-truth 2d pose. Our method uses re-projection error minimization as a constraint to predict the 3d locations of body joints, and this is crucial for training on data where the 3d ground-truth is not present. Since minimizing re-projection error alone may not guarantee an accurate 3d pose, we also use additional geometric constraints on skeleton pose to regularize the pose in 3d. We demonstrate the superior generalization ability of our method by cross-dataset validation on a challenging 3d benchmark dataset MPI-INF-3DHP containing in the wild 3d poses.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to estimate accurate 3D human poses from monocular images in the absence of large - scale 3D pose - annotated data. Specifically, the authors propose a weakly - supervised method. By combining 2D pose datasets and limited 3D pose datasets to train a deep neural network, effective prediction from 2D poses to 3D poses can be achieved. This method pays special attention to improving the generalization ability for in - the - wild poses, that is, it can accurately predict 3D poses in complex and variable real - world scenarios as well. The key points in the paper include: - **Problem Background**: Estimating 3D human poses from monocular images is a challenging problem because it involves the inherent ambiguity of recovering depth information from a single view. Although existing deep - learning - based methods perform well when there is a large amount of 3D - annotated data, they often have poor performance when dealing with in - the - wild poses. - **Solution**: The authors propose a weakly - supervised learning method, which is trained using a combination of 2D pose datasets and 3D pose datasets. The network structure includes two main modules: the 2D - to - 3D pose regression module and the 3D - to - 2D pose reprojection module. The 2D - to - 3D pose regression module is responsible for predicting 3D poses from the given 2D poses, while the 3D - to - 2D pose reprojection module ensures that the predicted 3D poses can be correctly re - projected back to the input 2D poses by minimizing the reprojection error. - **Innovation**: This method can not only train the network without 3D ground truth, but also introduces geometric constraints (such as bone - length symmetry loss) to further limit the solution space and ensure that the predicted 3D poses are physically reasonable. - **Experimental Verification**: The authors conducted experiments on multiple benchmark datasets, including Human3.6M, MPII, and MPI - INF - 3DHP. The results show that this method is superior to existing methods in terms of generalization ability and prediction accuracy. Through these designs, this paper effectively solves the key problem of how to improve the generalization ability of 3D pose - estimation models for in - the - wild poses in the absence of large - scale 3D - annotated data.

Lifting 2d Human Pose to 3d : A Weakly Supervised Approach

Weakly Supervised 2D Human Pose Transfer

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach

Deductive Learning for Weakly-Supervised 3D Human Pose Estimation Via Uncalibrated Cameras.

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

Weakly Supervised Adversarial Learning for 3D Human Pose Estimation from Point Clouds

Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild

Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

Heuristic Weakly Supervised 3D Human Pose Estimation

Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

Self-supervised 3D Human Pose Estimation from a Single Image

Robust Estimation of 3D Human Poses from a Single Image

3D Human Pose Machines with Self-supervised Learning

Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations