Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

Peter Bauer,Arij Bouazizi,Ulrich Kressel,Fabian B. Flohr

2023-07-27

Abstract:Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issue of accurate 3D Human Pose Estimation (3D HPE) for Autonomous Vehicles (AVs) in complex urban environments. Specifically, the goals of the paper include: 1. **Fusing Multimodal Data**: Proposing an effective method to fuse data from RGB images (i.e., 2D keypoints) and LiDAR point clouds to obtain 3D pose estimation of pedestrians or cyclists. 2. **Weak Supervision Learning Method**: Developing a weak supervision learning framework that can train models on target datasets without any 2D or 3D keypoint labels, requiring only accurate LiDAR-to-image projections. 3. **Performance Improvement**: Demonstrating through experiments on the Waymo Open Dataset that the proposed method improves by approximately 13% under weak supervision settings compared to existing methods and achieves state-of-the-art performance under supervised settings. The paper points out that obtaining accurate 3D pose annotations in uncontrolled outdoor scenes is very time-consuming and costly. Therefore, the proposed method effectively addresses this issue, thereby enhancing the safety and reliability of autonomous vehicles.

Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution.

Unsupervised Universal Hierarchical Multi-Person 3D Pose Estimation for Natural Scenes

Weakly Supervised 3D Multi-Person Pose Estimation for Large-Scale Scenes Based on Monocular Camera and Single LiDAR

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation

3D Vehicle Detection Using Cheap LiDAR and Camera Sensors.

3D Human Pose Estimation Based on Wearable IMUs and Multiple Camera Views

Heuristic Weakly Supervised 3D Human Pose Estimation

Multi-person 3D pose estimation from unlabelled data

Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras

Role of NO in pyloric, antral, and duodenal motility and its interaction with other inhibitory mediators

3D Human Pose Estimation from Deep Multi-View 2D Pose

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Epileptogenesis beyond the Hippocampus

Self-supervised 3D Human Pose Estimation from a Single Image

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows