Abstract:Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of head pose estimation (HPE) in monocular images, especially in complex and variable "in - the - wild scenarios". Specifically, the paper focuses on the following aspects:
1. **Limitations of the dataset**:
- The current research community mainly relies on a single semi - synthetic dataset, 300W - LP, for training, lacking diversity and alternative options.
- The paper attempts to explore the impact of data augmentation and synthesis strategies on model performance by expanding and improving existing datasets.
2. **Model design and optimization**:
- A new multi - task head / loss design, including uncertainty estimation, is proposed to improve the accuracy and robustness of the model.
- The model is required to be small, efficient, and capable of performing full 6 - degrees - of - freedom (6 DoF) pose estimation.
3. **Data augmentation and synthesis**:
- Multiple data augmentation methods, such as geometric transformation, brightness adjustment, etc., are explored to increase the diversity of data.
- Extended datasets (such as WFLW, LaPa) and fully synthetic datasets (such as Face Synthetics) are utilized to further enhance the generalization ability of the model.
4. **Uncertainty estimation**:
- A tangent space representation method of rotational uncertainty is introduced to better handle the uncertainty in pose estimation.
- Uncertainty parameters are learned through the negative log - likelihood (NLL) loss function, thereby improving the accuracy of the model.
### Main contributions of the paper
- **Improved HPE model**: An HPE model with higher accuracy at the current state - of - the - art level is proposed.
- **Extended training data**: Extended training datasets are introduced, and ablation experiments are carried out to verify their effectiveness.
- **Novel multi - task design**: A new multi - task head / loss design combined with uncertainty estimation is proposed.
- **New method for rotational uncertainty estimation**: The tangent space method is adopted for rotational uncertainty estimation, which solves the numerical challenges in the SO(3) space.
Through these improvements, the paper hopes to provide more powerful tools and methods for future research, especially in the field of monocular face pose estimation in complex environments.