Abstract:The movement of our eyes during conversations plays a crucial role in our communication. Through a mixture of aimed and subconscious control of our gaze, we nonverbally manage turn-taking in conversations and convey information about our state of mind and even neurological disorders. For animated avatars or robots, it is hence of fundamental importance to exhibit realistic eye movement in conversations to withstand the scrutiny of an observer and not fall into the Uncanny Valley. Otherwise, they will be rejected by the observer as unnatural and possibly scary, provoking disapproval of the entire avatar. Although there exist many promising application areas for avatars and great attention has been given to the automatic animation of mouth and facial expressions, the animation of the eyes is often left to simplistic, rule-based models or ignored altogether. In this work, we aim to alleviate this limitation by leveraging Generative Adversarial Networks (GANs), a potent machine-learning approach, to synthesize eye movement. By focusing on a restricted scenario of face-to-monitor interaction, we can concentrate on the eyes, ignoring additional factors such as gestures, body movement, and spatial positioning of conversation partners. Using a recently published dataset on eye movements during conversation, we train two GANs and compare their performance against three statistical models with hand-crafted rules. We subject all five models to statistical analysis, comparing them to the ground-truth data. We find that the GANs produce the best data of the four models that synthesized reasonable eye movement (excluding the best-scoring model for generating absurd movements). Additionally, we perform a user study, comparing each model pairwise against the others based on 73 participants, resulting in a total of 1314 pairwise comparisons. It shows that the GANs achieve acceptance ratings of 55.3% and 43.7%, outperforming the baseline model with an acceptance rate of 34.0%. Although the best model reaches 67.0%, beating our GANs using a set of rules, we argue that this approach will not be feasible once information like emotions or speech is added to the input.

Speech Driven Gaze in a Face-to-Face Interaction

Charting the Silent Signals of Social Gaze: Automating Eye Contact Assessment in Face-to-Face Conversations

Are you really looking at me? A Feature-Extraction Framework for Estimating Interpersonal Eye Gaze from Conventional Video

Vision-based Gaze Estimation: A Review

Gaze in a real-world social interaction: A dual eye-tracking study

I Can See it in Your Eyes: Gaze as an Implicit Cue of Uncanniness and Task Performance in Repeated Interactions

Gaze-action coupling, gaze-gesture coupling, and exogenous attraction of gaze in dyadic interactions

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

A New Method to Evaluate Gaze Behavior Patterns in Doctor-Patient Interaction

Gaze modeling in multi-party dialogues and extraversion expression through gaze aversion control

The semantics of gaze in person perception: a novel qualitative-quantitative approach

DGaze: CNN-Based Gaze Prediction in Dynamic Scenes.

Judging by the Look: The Impact of Robot Gaze Strategies on Human Cooperation

Multimodal Across Domains Gaze Target Detection

Automatic Gaze Analysis: A Survey of Deep Learning based Approaches

Gaze estimation learning architecture as support to affective, social and cognitive studies in natural human-robot interaction

Turn-taking in human face-to-face interaction is multimodal: gaze direction and manual gestures aid the coordination of turn transitions

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Gaze Generation for Avatars Using GANs

Off-camera gaze decreases evaluation scores in a simulated online job interview