Hi5: 2D Hand Pose Estimation with Zero Human Annotation

Masum Hasan,Cengiz Ozel,Nina Long,Alexander Martin,Samuel Potter,Tariq Adnan,Sangwu Lee,Amir Zadeh,Ehsan Hoque
2024-06-06
Abstract:We propose a new large synthetic hand pose estimation dataset, Hi5, and a novel inexpensive method for collecting high-quality synthetic data that requires no human annotation or validation. Leveraging recent advancements in computer graphics, high-fidelity 3D hand models with diverse genders and skin colors, and dynamic environments and camera movements, our data synthesis pipeline allows precise control over data diversity and representation, ensuring robust and fair model training. We generate a dataset with 583,000 images with accurate pose annotation using a single consumer PC that closely represents real-world variability. Pose estimation models trained with Hi5 perform competitively on real-hand benchmarks while surpassing models trained with real data when tested on occlusions and perturbations. Our experiments show promising results for synthetic data as a viable solution for data representation problems in real datasets. Overall, this paper provides a promising new approach to synthetic data creation and annotation that can reduce costs and increase the diversity and quality of data for hand pose estimation.
Computer Vision and Pattern Recognition,Graphics,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve several key problems in the field of hand pose estimation: 1. **Lack of diversity in datasets**: Existing hand pose estimation datasets are either collected in specific laboratory environments or uncontrolled from the Internet. These datasets lack real - world diversity and cannot fully represent different skin colors, genders and environmental conditions. 2. **High cost and error - prone of manual annotation**: Manually annotating hand pose datasets is a labor - intensive, time - consuming and error - prone task. Ensuring the diversity and representativeness of datasets is particularly challenging, which may lead to biases in training models. 3. **Limited dataset scale**: Although other computer vision tasks such as human pose estimation have benefited from large - scale datasets (e.g., COCO), the maximum scale of hand pose estimation datasets is much smaller than these datasets, limiting the improvement of model performance. To solve these problems, the paper proposes a new method to generate a diverse and representative synthetic hand pose estimation dataset Hi5, which is generated entirely using consumer - level hardware and does not require manual annotation. Specifically, this method uses high - fidelity 3D hand models, realistic animations and dynamic environmental and lighting conditions to create a comprehensive dataset that can accurately reflect real - world changes. ### Main contributions 1. **Novel data synthesis pipeline**: Provides precise control over data diversity and representation, ensuring the robustness and fairness of model training. 2. **Hi5 dataset**: A synthetic hand pose estimation dataset containing 583,000 images, with high diversity and representativeness, significantly improving model performance. 3. **Empirical verification**: Through experiments, it verifies the competitiveness of models trained on the Hi5 dataset on real - world benchmarks and shows the potential of synthetic data in solving the limitations of traditional data collection methods. In conclusion, this paper proposes an effective method to use synthetic data to overcome the problems of insufficient datasets and lack of diversity in hand pose estimation, providing a new approach for future research.