Abstract:Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities based on four public datasets collected in real-world (MoGaze), VR (ADT), as well as AR (GIMO and EgoBody) environments. We show that in human-object interactions, e.g. pick and place, eye gaze exhibits strong correlations with full-body motion while in human-human interactions, e.g. chat and teach, a person's gaze direction is correlated with the body orientation towards the interaction partner. Informed by these analyses we then present Pose2Gaze, a novel eye-body coordination model that uses a convolutional neural network and a spatio-temporal graph convolutional neural network to extract features from head direction and full-body poses, respectively, and then uses a convolutional neural network to predict eye gaze. We compare our method with state-of-the-art methods that predict eye gaze only from head movements and show that Pose2Gaze outperforms these baselines with an average improvement of 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody in mean angular error, respectively. We also show that our method significantly outperforms prior methods in the sample downstream task of eye-based activity recognition. These results underline the significant information content available in eye-body coordination during daily activities and open up a new direction for gaze prediction.

Recognizing visual focus of attention from head pose in natural meetings

Modeling focus of attention for meeting indexing based on multiple cues

Joint Estimation of Head Pose and Visual Focus of Attention

Facial feature points detecting based on Gaussian Mixture Models

3D head tracking using SIFT-based registration and scale invariant view-based appearance model

Recognition of Multi-Pose Head Gestures in Human Conversations

Dynamic Bayesian Network Based Visual Focus of Attention Recognition

Attention Recognition of Drivers Based on Head Pose Estimation

Geometric and appearance model based approach for head pose recovery in monocular image sequence

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

When Computer Vision Gazes at Cognition

Engagement Detection in Meetings

Integrating Human Gaze into Attention for Egocentric Activity Recognition

Recognizing Conversational Interaction Based On 3d Human Pose

FMRI-Guided Time-Symmetric Joint Model for Visual Attention Prediction.

A real-time head nod and shake detector using HMMs.

Eye-gaze Estimation with HEOG and Neck EMG Using Deep Neural Networks

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

Learning where to Attend with Deep Architectures for Image Tracking

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions