Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras

Yu-Jhe Li,Yan Xu,Rawal Khirodkar,Jinhyung Park,Kris Kitani

2024-01-28

Abstract:We tackle the task of multi-view, multi-person 3D human pose estimation from a limited number of uncalibrated depth cameras. Recently, many approaches have been proposed for 3D human pose estimation from multi-view RGB cameras. However, these works (1) assume the number of RGB camera views is large enough for 3D reconstruction, (2) the cameras are calibrated, and (3) rely on ground truth 3D poses for training their regression model. In this work, we propose to leverage sparse, uncalibrated depth cameras providing RGBD video streams for 3D human pose estimation. We present a simple pipeline for Multi-View Depth Human Pose Estimation (MVD-HPE) for jointly predicting the camera poses and 3D human poses without training a deep 3D human pose regression model. This framework utilizes 3D Re-ID appearance features from RGBD images to formulate more accurate correspondences (for deriving camera positions) compared to using RGB-only features. We further propose (1) depth-guided camera-pose estimation by leveraging 3D rigid transformations as guidance and (2) depth-constrained 3D human pose estimation by utilizing depth-projected 3D points as an alternative objective for optimization. In order to evaluate our proposed pipeline, we collect three video sets of RGBD videos recorded from multiple sparse-view depth cameras and ground truth 3D poses are manually annotated. Experiments show that our proposed method outperforms the current 3D human pose regression-free pipelines in terms of both camera pose estimation and 3D human pose estimation.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the problem of multi-view, multi-person 3D human pose estimation, particularly using a small number of uncalibrated depth cameras. Specifically, existing methods have the following limitations: 1. Assume a sufficient number of RGB cameras for 3D reconstruction. 2. Assume cameras are already calibrated. 3. Require real 3D pose data to train regression models. To solve these issues, the authors propose a new method—Multi-View Depth Human Pose Estimation (MVD-HPE), which does not require training a complex 3D human pose regression model. By using depth information from RGBD images, MVD-HPE can more accurately establish cross-view correspondences and simultaneously predict both camera poses and 3D human poses. ### Main Contributions 1. **Propose a simple regression-free method**: MVD-HPE uses a small number of uncalibrated depth cameras for 3D human pose estimation. 2. **Introduce a depth-guided minimization objective**: For more accurate estimation of camera poses. 3. **Introduce a depth-constrained triangulation algorithm**: For accurate human pose reconstruction using constraints from 3D point clouds. 4. **Experimental validation**: Demonstrates the superior performance of MVD-HPE in camera pose estimation and 3D human pose estimation on collected datasets.

Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Multi-person 3D pose estimation from unlabelled data

3D Human Pose Estimation from Deep Multi-View 2D Pose

Marker-Less 3d Human Motion Capture With Monocular Image Sequence And Height-Maps

3D Human Pose Estimation by Depth Map

Multi-Person 3d Pose Estimation From Monocular Image Sequences

Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks

Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views

3D Human Pose Estimation from Multiple Dynamic Views Via Single-view Pretraining with Procrustes Alignment

SMAP: Single-Shot Multi-person Absolute 3D Pose Estimation

Robust 3D Human Pose Estimation from Single Images or Video Sequences

Direct Multi-view Multi-person 3D Pose Estimation

Monocular 3D multi-person pose estimation via predicting factorized correction factors

A generalizable approach for multi-view 3D human pose regression

Part-Aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation