Abstract:The movement of our eyes during conversations plays a crucial role in our communication. Through a mixture of aimed and subconscious control of our gaze, we nonverbally manage turn-taking in conversations and convey information about our state of mind and even neurological disorders. For animated avatars or robots, it is hence of fundamental importance to exhibit realistic eye movement in conversations to withstand the scrutiny of an observer and not fall into the Uncanny Valley. Otherwise, they will be rejected by the observer as unnatural and possibly scary, provoking disapproval of the entire avatar. Although there exist many promising application areas for avatars and great attention has been given to the automatic animation of mouth and facial expressions, the animation of the eyes is often left to simplistic, rule-based models or ignored altogether. In this work, we aim to alleviate this limitation by leveraging Generative Adversarial Networks (GANs), a potent machine-learning approach, to synthesize eye movement. By focusing on a restricted scenario of face-to-monitor interaction, we can concentrate on the eyes, ignoring additional factors such as gestures, body movement, and spatial positioning of conversation partners. Using a recently published dataset on eye movements during conversation, we train two GANs and compare their performance against three statistical models with hand-crafted rules. We subject all five models to statistical analysis, comparing them to the ground-truth data. We find that the GANs produce the best data of the four models that synthesized reasonable eye movement (excluding the best-scoring model for generating absurd movements). Additionally, we perform a user study, comparing each model pairwise against the others based on 73 participants, resulting in a total of 1314 pairwise comparisons. It shows that the GANs achieve acceptance ratings of 55.3% and 43.7%, outperforming the baseline model with an acceptance rate of 34.0%. Although the best model reaches 67.0%, beating our GANs using a set of rules, we argue that this approach will not be feasible once information like emotions or speech is added to the input.

A Gaze Model Improves Autonomous Driving

Gaze Training by Modulated Dropout Improves Imitation Learning

Autonomous Driving with Human Guided Image Feature Extraction

Visual attention for behavioral cloning in autonomous driving

Human Visual Attention Prediction Boosts Learning & Performance of Autonomous Driving Agents

Efficiently Guiding Imitation Learning Agents with Human Gaze

A CGAN-based Model for Human-like Driving Decision Making

GSA-Gaze: Generative Self-adversarial Learning for Domain Generalized Driver Gaze Estimation

Guiding Attention in End-to-End Driving Models

Explaining Autonomous Driving by Learning End-to-End Visual Attention

AGIL: Learning Attention from Human for Visuomotor Tasks

A Novel Driving Behavior Learning and Visualization Method with Natural Gaze Prediction

DeepGoal: Learning to drive with driving intention from human control demonstration

Lane Changing Maneuver Prediction by Using Driver’s Spatio-Temporal Gaze Attention Inputs for Naturalistic Driving

Seeing with Humans: Gaze-Assisted Neural Image Captioning

Visual attention prediction improves performance of autonomous drone racing agents

Behavioral Cloning Models Reality Check for Autonomous Driving

Gaze Generation for Avatars Using GANs

Deep Multitask Gaze Estimation with a Constrained Landmark-Gaze Model

Explaining autonomous driving with visual attention and end-to-end trainable region proposals

CUEING: a lightweight model to Capture hUman attEntion In driviNG