Abstract:This study addresses the nuanced challenge of estimating head translations within the context of six-degrees-of-freedom (6DoF) head pose estimation, placing emphasis on this aspect over the more commonly studied head rotations. Identifying a gap in existing methodologies, we recognized the underutilized potential synergy between facial geometry and head translation. To bridge this gap, we propose a novel approach called the head Translation, Rotation, and face Geometry network (TRG), which stands out for its explicit bidirectional interaction structure. This structure has been carefully designed to leverage the complementary relationship between face geometry and head translation, marking a significant advancement in the field of head pose estimation. Our contributions also include the development of a strategy for estimating bounding box correction parameters and a technique for aligning landmarks to image. Both of these innovations demonstrate superior performance in 6DoF head pose estimation tasks. Extensive experiments conducted on ARKitFace and BIWI datasets confirm that the proposed method outperforms current state-of-the-art techniques. Codes are released at <a class="link-external link-https" href="https://github.com/asw91666/TRG-Release" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the head translation estimation problem in 6 - degree - of - freedom (6DoF) head pose estimation. Specifically, the author points out that most of the existing research mainly focuses on the estimation of head rotation, while less attention is paid to the estimation of head translation. In addition, existing methods face challenges when dealing with the estimation of head translation from a single image, especially due to the interdependence and ambiguity between the actual - scale facial geometry and head translation. To solve these problems, the author proposes a new method, called **Translation, Rotation, and face Geometry network (TRG)**. By introducing an explicit two - way interaction structure, TRG makes full use of the complementary relationship between facial geometry information and head translation, thereby improving the accuracy of 6DoF head pose estimation. The following are the main contributions of this method: 1. **Explicit two - way interaction structure**: TRG first introduces an explicit two - way interaction structure between head translation and facial geometry. Through this innovative structure, TRG can simultaneously reduce the ambiguity of head depth and face size. 2. **Bounding box correction parameter estimation strategy**: TRG proposes a strategy for estimating bounding box correction parameters, which shows stable generalization performance when dealing with out - of - distribution data. 3. **Landmark - to - image alignment strategy**: TRG adopts a landmark - to - image alignment strategy, which not only improves the accuracy of head translation estimation but also improves the estimation precision of head rotation. 4. **Depth - aware landmark prediction architecture**: The depth - aware landmark prediction architecture of TRG shows high precision when dealing with images that are greatly affected by perspective distortion, such as selfies. 5. **Experimental results**: Extensive experiments on the ARKitFace and BIWI datasets show that TRG outperforms the current state - of - the - art methods in the 6DoF head pose estimation task. ### Formula Representation Some formulas involved in the paper are as follows: - The calculation formula of head translation \(T_t\): \[ T_{x_t}=0.2s_t\left(\frac{\tau_{x,\text{bbox}}}{b}+\tilde{\tau}_{x,\text{face}_t}\right) \] \[ T_{y_t}=0.2s_t\left(\frac{\tau_{y,\text{bbox}}}{b}+\tilde{\tau}_{y,\text{face}_t}\right) \] \[ T_{z_t}=0.2s_t\left(\frac{f}{b}\right) \] - The calculation formula of the image coordinates \(V^{\text{img}}_t\) of dense landmarks: \[ V^{\text{img}}_t = \Pi(V_t, R_t, T_t, K) \] These formulas show how TRG uses bounding box information and correction parameters to estimate head translation and maps dense landmarks to the image space through perspective projection.

6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry

Pose Estimation and Neural Implicit Reconstruction Towards Non-Cooperative Spacecraft Without Offline Prior Information

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

6DFLRNet: 6D rotation representation for head pose estimation based on facial landmarks and regression

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

6D pose estimation of 3D objects in scenes with mutual similarities and occlusions

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Toward Robust and Unconstrained Full Range of Rotation Head Pose Estimation

Geo6D: Geometric Constraints Learning for 6D Pose Estimation

Fine segmentation and difference-aware shape adjustment for category-level 6DoF object pose estimation

Facial Augmented Reality based on Hierarchical Optimization of Similarity Aspect Graph

TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer

Depth-based 6DoF Object Pose Estimation using Swin Transformer

Toward 3D Face Reconstruction in Perspective Projection: Estimating 6DoF Face Pose From Monocular Image

Joint Head Pose and Facial Landmark Regression from Depth Images

FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

A large depth-of-field virtual measurement network for non-cooperative 6DOF pose estimation in occlusion scenes

EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation

A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization

Deep Learning-Based 6-DoF Object Pose Estimation Considering Synthetic Dataset