Abstract:Training an accurate 3D human pose estimator often requires a large amount of 3D ground-truth data which is inefficient and costly to collect. Previous methods have either resorted to weakly supervised methods to reduce the demand of ground-truth data for training, or using synthetically-generated but photo-realistic samples to enlarge the training data pool. Nevertheless, the former methods mainly require either additional supervision, such as unpaired 3D ground-truth data, or the camera parameters in multiview settings. On the other hand, the latter methods require accurately textured models, illumination configurations and background which need careful engineering. To address these problems, we propose a domain adaptation framework with unsupervised knowledge transfer, which aims at leveraging the knowledge in multi-modality data of the easy-to-get synthetic depth datasets to better train a pose estimator on the real-world datasets. Specifically, the framework first trains two pose estimators on synthetically-generated depth images and human body segmentation masks with full supervision, while jointly learning a human body segmentation module from the predicted 2D poses. Subsequently, the learned pose estimator and the segmentation module are applied to the real-world dataset to unsupervisedly learn a new RGB image based 2D/3D human pose estimator. Here, the knowledge encoded in the supervised learning modules are used to regularize a pose estimator without ground-truth annotations. Comprehensive experiments demonstrate significant improvements over weakly supervised methods when no ground-truth annotations are available. Further experiments with ground-truth annotations show that the proposed framework can outperform state-of-the-art fully supervised methods. In addition, we conducted ablation studies to examine the impact of each loss term, as well as with different amount of supervisions signal.

Learning to Estimate Object Poses Without Real Image Annotations

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

Unseen Object Pose Estimation via Registration

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

Robust RGB-based 6-DoF Pose Estimation without Real Pose Annotations

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild.

Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

Domain Transfer for 3D Pose Estimation from Color Images Without Manual Annotations

Deep Learning-Based 6-DoF Object Pose Estimation Considering Synthetic Dataset

Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

Indoor GeoNet: Weakly Supervised Hybrid Learning for Depth and Pose Estimation

PVNet: Pixel-Wise Voting Network for 6dof Object Pose Estimation.

CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning

Learning to Estimate 3D Human Pose and Shape from a Single Color Image

An RGB-D Based Approach for Human Pose Estimation

NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation