Abstract:Gaze estimation, which has a wide range of applications in many scenarios, is a challenging task due to various unconstrained conditions. As information from both full-face and eye images is instrumental in improving gaze estimation, many multiregion gaze estimation models have been proposed in recent studies. However, most of them simply use the same regression method on both eye and face images, overlooking that the eye region may contribute more fine-grained features than the full-face region, and the variation in the left and right eyes of an individual caused by head pose, illumination, and partially occluded eye may lead to inconsistent estimations. To address these issues, we propose an appearance-based end-to-end learning network architecture with an attention mechanism, named efficient gaze network (EG-Net), which employs a two-branch network for gaze estimation. Specifically, a base CNN is utilized for full-face images, while an efficient eye network (EE-Net), which is scaled up from the base CNN, is used for left- and right-eye images. EE-Net uniformly scales up the depth, width and resolution of the base CNN with a set of constant coefficients for eye feature extraction and adaptively weights the left- and right-eye images via an attention network according to its "image quality". Finally, features from the full-face image, two individual eye images and head pose vectors are fused to regress the eye gaze vectors. We evaluate our approach on 3 public datasets, the proposed EG-Net model achieves much better performance. In particular, our EG-Net-v4 model outperforms state-of-the-art approaches on the MPIIFaceGaze dataset, with prediction errors of 2.41 cm and 2.76 degrees in 2D and 3D gaze estimation, respectively. It also yields a performance improvement to 1.58 cm on GazeCapture and 4.55 degrees on EyeDIAP dataset, with 23.4 % and 14.2 % improvement over prior arts on the two datasets respectively. The code related to this project is open-source and available at https://github.com/wuxinmei/EE_Net.git .

Gaze Estimation by Attention-Induced Hierarchical Variational Auto-Encoder.

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

EG-Net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism

EM-Net: Gaze Estimation with Expectation Maximization Algorithm

Gaze Target Estimation inspired by Interactive Attention

Gaze Estimation Based on the Improved Xception Network

3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views

Integrating Human Gaze into Attention for Egocentric Activity Recognition

Adaptive gaze estimation based on channel attention mechanism

Gaze Estimation via Modulation-based Adaptive Network with Auxiliary Self-Learning

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

DVGaze: Dual-View Gaze Estimation

Vicsgaze: a gaze estimation method using self-supervised contrastive learning

An Individual-Difference-Aware Model for Cross-Person Gaze Estimation

Monocular 3D gaze estimation using feature discretization and attention mechanism

LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Gaze-Vector Estimation in the Dark with Temporally Encoded Event-driven Neural Networks

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Domain-Consistent and Uncertainty-Aware Network for Generalizable Gaze Estimation

Appearance Debiased Gaze Estimation via Stochastic Subject-Wise Adversarial Learning