Abstract:As an indicator of attention, gaze is an important cue for human behavior and social interaction analysis. Recent deep learning methods for gaze estimation rely on plain regression of the gaze from images without accounting for potential mismatches in eye image cropping and normalization. This may impact the estimation of the implicit relation between visual cues and the gaze direction when dealing with low resolution images or when training with a limited amount of data. In this paper, we propose a deep multitask framework for gaze estimation, with the following contributions. (i) we proposed a multitask framework which relies on both synthetic data and real data for end-to-end training. During training, each dataset provides the label of only one task but the two tasks are combined in a constrained way. (ii) we introduce a Constrained Landmark-Gaze Model (CLGM) modeling the joint variation of eye landmark locations (including the iris center) and gaze directions. By relating explicitly visual information (landmarks) to the more abstract gaze values, we demonstrate that the estimator is more accurate and easier to learn. (iii) by decomposing our deep network into a network inferring jointly the parameters of the CLGM model and the scale and translation parameters of eye regions on one hand, and a CLGM based decoder deterministically inferring landmark positions and gaze from these parameters and head pose on the other hand, our framework decouples gaze estimation from irrelevant geometric variations in the eye image (scale, translation), resulting in a more robust model. Thorough experiments on public datasets demonstrate that our method achieves competitive results, improving over state-of-the-art results in challenging free head pose gaze estimation tasks and on eye landmark localization (iris location) ones.

MUGGLE: MUlti-Stream Group Gaze Learning and Estimation.

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Multi-Person Gaze-Following with Numerical Coordinate Regression

Multi-task Gaze Estimation Via Unidirectional Convolution

LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask

360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

DGaze: CNN-Based Gaze Prediction in Dynamic Scenes.

Robust Gaze Point Estimation for Metaverse With Common Mode Features Suppression Network

An Attention Detection System Based on Gaze Estimation Using Self-supervised Learning

Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation

Towards Pixel-Level Prediction for Gaze Following: Benchmark and Approach

Gaze Estimation with Eye Region Segmentation and Self-Supervised Multistream Learning

Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

Believe It or Not, We Know What You Are Looking at!

Gaze Estimation Based on the Improved Xception Network

GaTector: A Unified Framework for Gaze Object Prediction

Gaze Target Estimation inspired by Interactive Attention

Gaze Gestures and Their Applications in human-computer interaction with a head-mounted display

Gaze Estimation Using Neural Network And Logistic Regression