Abstract:Training an accurate 3D human pose estimator often requires a large amount of 3D ground-truth data which is inefficient and costly to collect. Previous methods have either resorted to weakly supervised methods to reduce the demand of ground-truth data for training, or using synthetically-generated but photo-realistic samples to enlarge the training data pool. Nevertheless, the former methods mainly require either additional supervision, such as unpaired 3D ground-truth data, or the camera parameters in multiview settings. On the other hand, the latter methods require accurately textured models, illumination configurations and background which need careful engineering. To address these problems, we propose a domain adaptation framework with unsupervised knowledge transfer, which aims at leveraging the knowledge in multi-modality data of the easy-to-get synthetic depth datasets to better train a pose estimator on the real-world datasets. Specifically, the framework first trains two pose estimators on synthetically-generated depth images and human body segmentation masks with full supervision, while jointly learning a human body segmentation module from the predicted 2D poses. Subsequently, the learned pose estimator and the segmentation module are applied to the real-world dataset to unsupervisedly learn a new RGB image based 2D/3D human pose estimator. Here, the knowledge encoded in the supervised learning modules are used to regularize a pose estimator without ground-truth annotations. Comprehensive experiments demonstrate significant improvements over weakly supervised methods when no ground-truth annotations are available. Further experiments with ground-truth annotations show that the proposed framework can outperform state-of-the-art fully supervised methods. In addition, we conducted ablation studies to examine the impact of each loss term, as well as with different amount of supervisions signal.

Pose Estimation of Multiple Domains Based on the Fusion of Multiple Deep Learning Models and Baidu API

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

Domain adaptive pose estimation via multi-level alignment

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Modelling Human Body Pose for Action Recognition Using Deep Neural Networks

A Pose Estimation Algorithm for Multimodal Data Fusion

Mask-based Object Pose Estimation with Domain Transfer

Deep Learning-Based 6-DoF Object Pose Estimation Considering Synthetic Dataset

An Improved Estimation Algorithm of Space Targets Pose Based on Multi-Modal Feature Fusion

Pose-robust personalized facial expression recognition through unsupervised multi-source domain adaptation

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation

Reconstructing 3D human pose and shape from a single image and sparse IMUs

Improving synthetic 3D model-aided indoor image localization via domain adaptation

FusePose: IMU-Vision Sensor Fusion in Kinematic Space for Parametric Human Pose Estimation

DUA: A Domain-Unified Approach for Cross-Dataset 3D Human Pose Estimation

Robust Classification and 6D Pose Estimation by Sensor Dual Fusion of Image and Point Cloud Data

PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Domain Adaptation on Point Clouds for 6D Pose Estimation in Bin-Picking Scenarios