Abstract:Gaze estimation, which has a wide range of applications in many scenarios, is a challenging task due to various unconstrained conditions. As information from both full-face and eye images is instrumental in improving gaze estimation, many multiregion gaze estimation models have been proposed in recent studies. However, most of them simply use the same regression method on both eye and face images, overlooking that the eye region may contribute more fine-grained features than the full-face region, and the variation in the left and right eyes of an individual caused by head pose, illumination, and partially occluded eye may lead to inconsistent estimations. To address these issues, we propose an appearance-based end-to-end learning network architecture with an attention mechanism, named efficient gaze network (EG-Net), which employs a two-branch network for gaze estimation. Specifically, a base CNN is utilized for full-face images, while an efficient eye network (EE-Net), which is scaled up from the base CNN, is used for left- and right-eye images. EE-Net uniformly scales up the depth, width and resolution of the base CNN with a set of constant coefficients for eye feature extraction and adaptively weights the left- and right-eye images via an attention network according to its "image quality". Finally, features from the full-face image, two individual eye images and head pose vectors are fused to regress the eye gaze vectors. We evaluate our approach on 3 public datasets, the proposed EG-Net model achieves much better performance. In particular, our EG-Net-v4 model outperforms state-of-the-art approaches on the MPIIFaceGaze dataset, with prediction errors of 2.41 cm and 2.76 degrees in 2D and 3D gaze estimation, respectively. It also yields a performance improvement to 1.58 cm on GazeCapture and 4.55 degrees on EyeDIAP dataset, with 23.4 % and 14.2 % improvement over prior arts on the two datasets respectively. The code related to this project is open-source and available at https://github.com/wuxinmei/EE_Net.git .

An Encoder-Decoder Network with Residual and Attention Blocks for Full-Face 3D Gaze Estimation

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

End-to-End Spatial Transform Face Detection and Recognition

Gaze Estimation Based on the Improved Xception Network

EG-Net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism

A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation

A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial Image

Frequency-spatial Interaction Network for Gaze Estimation

It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation

Gaze Estimation by Attention-Induced Hierarchical Variational Auto-Encoder.

EFE: End-to-end Frame-to-Gaze Estimation

Multi-Person Gaze-Following with Numerical Coordinate Regression

Attention-Based Dense Decoding Network for Monocular Depth Estimation

A Dual Encoder–Decoder Network for Self-Supervised Monocular Depth Estimation

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

Monocular 3D gaze estimation using feature discretization and attention mechanism

Residual-Network-Based Supervised Gaze Prediction for First-Person Videos

Gaze Estimation Approach Using Deep Differential Residual Network

Appearance-based Gaze Estimation with Multi-Modal Convolutional Neural Networks

Gaze Estimation with Eye Region Segmentation and Self-Supervised Multistream Learning

Contrastive attention network with dense field estimation for face completion