FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

Jiong Wang,Fengyu Yang,Wenbo Gou,Bingliang Li,Danqi Yan,Ailing Zeng,Yijun Gao,Junle Wang,Yanqing Jing,Ruimao Zhang
2024-04-03
Abstract:Estimating the 3D structure of the human body from natural scenes is a fundamental aspect of visual perception. 3D human pose estimation is a vital step in advancing fields like AIGC and human-robot interaction, serving as a crucial technique for understanding and interacting with human actions in real-world settings. However, the current datasets, often collected under single laboratory conditions using complex motion capture equipment and unvarying backgrounds, are insufficient. The absence of datasets on variable conditions is stalling the progress of this crucial task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions. We have also established an semi-automated pipeline containing error detection to reduce the workload of manual check and ensure precise annotation. We provide comprehensive evaluation baselines for a range of tasks, underlining the significant challenges posed by FreeMan. Further evaluations of standard indoor/outdoor human sensing datasets reveal that FreeMan offers robust representation transferability in real and complex scenes. Code and data are available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the application challenges of 3D Human Pose Estimation (3D HPE) in real-world scenarios by proposing a new large-scale multi-view dataset called FreeMan. The research aims to solve the following key issues: 1. **Improving the generalization ability of models under real-world conditions**: Existing datasets are usually collected under laboratory conditions using complex motion capture equipment and have uniform backgrounds, leading to poor performance of trained models in real-world environments. 2. **Increasing scene diversity**: Most existing 3D HPE datasets are collected in controlled environments, resulting in limited variations in lighting conditions and backgrounds, which is a limitation for models that need to handle complex scenes. 3. **Expanding the range of actions and human scales**: The range of human actions in existing datasets is limited, and due to the use of fixed cameras, the size of humans in different videos is relatively fixed, lacking diversity. 4. **Enhancing the scalability of datasets**: Current datasets rely heavily on expensive manual processing for annotations, limiting the expansion of dataset scale. Especially with variable camera positions, how to effectively align and annotate data from different cameras remains an unresolved issue. To address the above problems, the researchers proposed the FreeMan dataset, a large-scale multi-view 3D HPE dataset collected under real-world conditions. This dataset includes 11 million frames of images synchronously captured from 8 smartphone cameras, covering performances by 40 participants in 10 different types of scenes. The features of the FreeMan dataset include: - Diverse backgrounds and lighting conditions, enhancing the model's generalization ability in real-world scenarios. - Significant variations in the distance between humans and cameras, leading to changes in human sizes, increasing the dataset's diversity. - A semi-automated annotation pipeline combined with an error detection mechanism, reducing manual workload and improving the scalability and annotation accuracy of the dataset. - The dataset is suitable for various tasks, including monocular 3D pose estimation, 2D to 3D lifting, multi-view 3D pose estimation, and human neural rendering. Experimental results show that models trained using the FreeMan dataset significantly outperform those trained with other existing datasets on the 3DPW test set, demonstrating the effectiveness of the FreeMan dataset in improving the real-world generalization ability of models.