Abstract:As an indicator of attention, gaze is an important cue for human behavior and social interaction analysis. Recent deep learning methods for gaze estimation rely on plain regression of the gaze from images without accounting for potential mismatches in eye image cropping and normalization. This may impact the estimation of the implicit relation between visual cues and the gaze direction when dealing with low resolution images or when training with a limited amount of data. In this paper, we propose a deep multitask framework for gaze estimation, with the following contributions. (i) we proposed a multitask framework which relies on both synthetic data and real data for end-to-end training. During training, each dataset provides the label of only one task but the two tasks are combined in a constrained way. (ii) we introduce a Constrained Landmark-Gaze Model (CLGM) modeling the joint variation of eye landmark locations (including the iris center) and gaze directions. By relating explicitly visual information (landmarks) to the more abstract gaze values, we demonstrate that the estimator is more accurate and easier to learn. (iii) by decomposing our deep network into a network inferring jointly the parameters of the CLGM model and the scale and translation parameters of eye regions on one hand, and a CLGM based decoder deterministically inferring landmark positions and gaze from these parameters and head pose on the other hand, our framework decouples gaze estimation from irrelevant geometric variations in the eye image (scale, translation), resulting in a more robust model. Thorough experiments on public datasets demonstrate that our method achieves competitive results, improving over state-of-the-art results in challenging free head pose gaze estimation tasks and on eye landmark localization (iris location) ones.

Vicsgaze: a gaze estimation method using self-supervised contrastive learning

GVGNet: Gaze-Directed Visual Grounding for Learning Under-Specified Object Referring Intention

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views

FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain Contrastive Learning

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

DVGaze: Dual-View Gaze Estimation

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

Facial Landmarks Based Region-Level Data Augmentation for Gaze Estimation

HybridGazeNet: Geometric model guided Convolutional Neural Networks for gaze estimation

Semi-supervised Contrastive Regression for Estimation of Eye Gaze

Gaze Estimation via Modulation-based Adaptive Network with Auxiliary Self-Learning

Gaze Estimation Based on the Improved Xception Network

Gaze Estimation with Eye Region Segmentation and Self-Supervised Multistream Learning

Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Highly efficient gaze estimation method using online convolutional re-parameterization

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild.

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Multi-task Gaze Estimation Via Unidirectional Convolution