Abstract:Deep learning has been impressively successful in the last decade in predicting human head poses from monocular images. However, for in-the-wild inputs the research community relies predominantly on a single training set, 300W-LP, of semisynthetic nature without many alternatives. This paper focuses on gradual extension and improvement of the data to explore the performance achievable with augmentation and synthesis strategies further. Modeling-wise a novel multitask head/loss design which includes uncertainty estimation is proposed. Overall, the thus obtained models are small, efficient, suitable for full 6 DoF pose estimation, and exhibit very competitive accuracy.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of head pose estimation (HPE) in monocular images, especially in complex and variable "in - the - wild scenarios". Specifically, the paper focuses on the following aspects: 1. **Limitations of the dataset**: - The current research community mainly relies on a single semi - synthetic dataset, 300W - LP, for training, lacking diversity and alternative options. - The paper attempts to explore the impact of data augmentation and synthesis strategies on model performance by expanding and improving existing datasets. 2. **Model design and optimization**: - A new multi - task head / loss design, including uncertainty estimation, is proposed to improve the accuracy and robustness of the model. - The model is required to be small, efficient, and capable of performing full 6 - degrees - of - freedom (6 DoF) pose estimation. 3. **Data augmentation and synthesis**: - Multiple data augmentation methods, such as geometric transformation, brightness adjustment, etc., are explored to increase the diversity of data. - Extended datasets (such as WFLW, LaPa) and fully synthetic datasets (such as Face Synthetics) are utilized to further enhance the generalization ability of the model. 4. **Uncertainty estimation**: - A tangent space representation method of rotational uncertainty is introduced to better handle the uncertainty in pose estimation. - Uncertainty parameters are learned through the negative log - likelihood (NLL) loss function, thereby improving the accuracy of the model. ### Main contributions of the paper - **Improved HPE model**: An HPE model with higher accuracy at the current state - of - the - art level is proposed. - **Extended training data**: Extended training datasets are introduced, and ablation experiments are carried out to verify their effectiveness. - **Novel multi - task design**: A new multi - task head / loss design combined with uncertainty estimation is proposed. - **New method for rotational uncertainty estimation**: The tangent space method is adopted for rotational uncertainty estimation, which solves the numerical challenges in the SO(3) space. Through these improvements, the paper hopes to provide more powerful tools and methods for future research, especially in the field of monocular face pose estimation in complex environments.

On the power of data augmentation for head pose estimation

Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation.

Post-Data Augmentation to Improve Deep Pose Estimation of Extreme and Wild Motions

A Semi-Supervised Data Augmentation Approach using 3D Graphical Engines

Unsupervised Domain Adaptation for 3D Human Pose Estimation

Semi-Supervised Unconstrained Head Pose Estimation in the Wild

Adversarial Semantic Data Augmentation for Human Pose Estimation

Overcoming Data Deficiency for Multi-Person Pose Estimation

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

Facial Landmarks Based Region-Level Data Augmentation for Gaze Estimation

Generalizing Monocular 3d Human Pose Estimation In The Wild

PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation

Full-range Head Pose Geometric Data Augmentations

Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation

An Effective Deep Network for Head Pose Estimation without Keypoints

Toward Robust and Unconstrained Full Range of Rotation Head Pose Estimation

Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis

Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking

Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery