Abstract:Gaze estimation, which has a wide range of applications in many scenarios, is a challenging task due to various unconstrained conditions. As information from both full-face and eye images is instrumental in improving gaze estimation, many multiregion gaze estimation models have been proposed in recent studies. However, most of them simply use the same regression method on both eye and face images, overlooking that the eye region may contribute more fine-grained features than the full-face region, and the variation in the left and right eyes of an individual caused by head pose, illumination, and partially occluded eye may lead to inconsistent estimations. To address these issues, we propose an appearance-based end-to-end learning network architecture with an attention mechanism, named efficient gaze network (EG-Net), which employs a two-branch network for gaze estimation. Specifically, a base CNN is utilized for full-face images, while an efficient eye network (EE-Net), which is scaled up from the base CNN, is used for left- and right-eye images. EE-Net uniformly scales up the depth, width and resolution of the base CNN with a set of constant coefficients for eye feature extraction and adaptively weights the left- and right-eye images via an attention network according to its "image quality". Finally, features from the full-face image, two individual eye images and head pose vectors are fused to regress the eye gaze vectors. We evaluate our approach on 3 public datasets, the proposed EG-Net model achieves much better performance. In particular, our EG-Net-v4 model outperforms state-of-the-art approaches on the MPIIFaceGaze dataset, with prediction errors of 2.41 cm and 2.76 degrees in 2D and 3D gaze estimation, respectively. It also yields a performance improvement to 1.58 cm on GazeCapture and 4.55 degrees on EyeDIAP dataset, with 23.4 % and 14.2 % improvement over prior arts on the two datasets respectively. The code related to this project is open-source and available at https://github.com/wuxinmei/EE_Net.git .

Appearance-based Gaze Estimation with Multi-Modal Convolutional Neural Networks

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

FDN: Feature Decoupling Network for Head Pose Estimation.

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

Fine-grained gaze estimation based on the combination of regression and classification losses

Multi-Person Gaze-Following with Numerical Coordinate Regression

Gaze Estimation Based on the Improved Xception Network

Gaze Estimation via Modulation-based Adaptive Network with Auxiliary Self-Learning

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

EG-Net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism

Multi-task Gaze Estimation Via Unidirectional Convolution

Facial Landmarks Based Region-Level Data Augmentation for Gaze Estimation

Convolutional Neural Network-Based Technique for Gaze Estimation on Mobile Devices

Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation

Feature Decomposition-Based Gaze Estimation with Auxiliary Head Pose Regression

Appearance-based gaze estimation enhanced with synthetic images using deep neural networks

A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial Image

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Gaze Estimation Using Neural Network And Logistic Regression

Adaptive gaze estimation based on channel attention mechanism

Robust Gaze Point Estimation for Metaverse With Common Mode Features Suppression Network