Abstract:Training an accurate 3D human pose estimator often requires a large amount of 3D ground-truth data which is inefficient and costly to collect. Previous methods have either resorted to weakly supervised methods to reduce the demand of ground-truth data for training, or using synthetically-generated but photo-realistic samples to enlarge the training data pool. Nevertheless, the former methods mainly require either additional supervision, such as unpaired 3D ground-truth data, or the camera parameters in multiview settings. On the other hand, the latter methods require accurately textured models, illumination configurations and background which need careful engineering. To address these problems, we propose a domain adaptation framework with unsupervised knowledge transfer, which aims at leveraging the knowledge in multi-modality data of the easy-to-get synthetic depth datasets to better train a pose estimator on the real-world datasets. Specifically, the framework first trains two pose estimators on synthetically-generated depth images and human body segmentation masks with full supervision, while jointly learning a human body segmentation module from the predicted 2D poses. Subsequently, the learned pose estimator and the segmentation module are applied to the real-world dataset to unsupervisedly learn a new RGB image based 2D/3D human pose estimator. Here, the knowledge encoded in the supervised learning modules are used to regularize a pose estimator without ground-truth annotations. Comprehensive experiments demonstrate significant improvements over weakly supervised methods when no ground-truth annotations are available. Further experiments with ground-truth annotations show that the proposed framework can outperform state-of-the-art fully supervised methods. In addition, we conducted ablation studies to examine the impact of each loss term, as well as with different amount of supervisions signal.

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

Source-free Domain Adaptive Human Pose Estimation

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Context-Guided Adaptive Network for Efficient Human Pose Estimation.

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

Unsupervised Domain Adaptation for 3D Human Pose Estimation

APP: Adaptive Pose Pooling for 3D Human Pose Estimation from Videos

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

Alleviating Human-level Shift : A Robust Domain Adaptation Method for Multi-person Pose Estimation

Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery

Evaluating 3D Human Pose Estimation in Occluded Multi-Sensor Scenarios: Dataset and Annotation Approach

Hierarchical Keypoints Feature Alignment for Domain Adaptive Pose Estimation

Towards Locality Similarity Preserving to 3D Human Pose Estimation.

Overcoming Data Deficiency for Multi-Person Pose Estimation

Enhanced 3D Pose Estimation in Multi-Person, Multi-View Scenarios through Unsupervised Domain Adaptation with Dropout Discriminator

View Consistency Aware Holistic Triangulation for 3D Human Pose Estimation

Adaptive Multi-Path Aggregation for Human DensePose Estimation in the Wild

SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance

EANet: Towards Lightweight Human Pose Estimation With Effective Aggregation Network

Multi Hybrid Extractor Network for 3D Human Pose Estimation