Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

Peter Bauer,Arij Bouazizi,Ulrich Kressel,Fabian B. Flohr
2023-07-27
Abstract:Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the issue of accurate 3D Human Pose Estimation (3D HPE) for Autonomous Vehicles (AVs) in complex urban environments. Specifically, the goals of the paper include: 1. **Fusing Multimodal Data**: Proposing an effective method to fuse data from RGB images (i.e., 2D keypoints) and LiDAR point clouds to obtain 3D pose estimation of pedestrians or cyclists. 2. **Weak Supervision Learning Method**: Developing a weak supervision learning framework that can train models on target datasets without any 2D or 3D keypoint labels, requiring only accurate LiDAR-to-image projections. 3. **Performance Improvement**: Demonstrating through experiments on the Waymo Open Dataset that the proposed method improves by approximately 13% under weak supervision settings compared to existing methods and achieves state-of-the-art performance under supervised settings. The paper points out that obtaining accurate 3D pose annotations in uncontrolled outdoor scenes is very time-consuming and costly. Therefore, the proposed method effectively addresses this issue, thereby enhancing the safety and reliability of autonomous vehicles.