Abstract:The movement of our eyes during conversations plays a crucial role in our communication. Through a mixture of aimed and subconscious control of our gaze, we nonverbally manage turn-taking in conversations and convey information about our state of mind and even neurological disorders. For animated avatars or robots, it is hence of fundamental importance to exhibit realistic eye movement in conversations to withstand the scrutiny of an observer and not fall into the Uncanny Valley. Otherwise, they will be rejected by the observer as unnatural and possibly scary, provoking disapproval of the entire avatar. Although there exist many promising application areas for avatars and great attention has been given to the automatic animation of mouth and facial expressions, the animation of the eyes is often left to simplistic, rule-based models or ignored altogether. In this work, we aim to alleviate this limitation by leveraging Generative Adversarial Networks (GANs), a potent machine-learning approach, to synthesize eye movement. By focusing on a restricted scenario of face-to-monitor interaction, we can concentrate on the eyes, ignoring additional factors such as gestures, body movement, and spatial positioning of conversation partners. Using a recently published dataset on eye movements during conversation, we train two GANs and compare their performance against three statistical models with hand-crafted rules. We subject all five models to statistical analysis, comparing them to the ground-truth data. We find that the GANs produce the best data of the four models that synthesized reasonable eye movement (excluding the best-scoring model for generating absurd movements). Additionally, we perform a user study, comparing each model pairwise against the others based on 73 participants, resulting in a total of 1314 pairwise comparisons. It shows that the GANs achieve acceptance ratings of 55.3% and 43.7%, outperforming the baseline model with an acceptance rate of 34.0%. Although the best model reaches 67.0%, beating our GANs using a set of rules, we argue that this approach will not be feasible once information like emotions or speech is added to the input.

Automatic cinematography for body movement involved virtual communication

OAW-GAN: Occlusion-Aware Warping GAN for Unified Human Video Synthesis

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography

Virtual View Generation Based on 3D-Dense-attentive GAN Networks

AMG: Avatar Motion Guided Video Generation

GAC-GAN: A General Method for Appearance-Controllable Human Video Motion Transfer

XAGen: 3D Expressive Human Avatars Generation

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

3D human pose data augmentation using Generative Adversarial Networks for robotic-assisted movement quality assessment

CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention

Human Motion Transfer With 3D Constraints and Detail Enhancement

Gaze Generation for Avatars Using GANs

GANimation: Anatomically-aware Facial Animation from a Single Image

Towards Practical Single-shot Motion Synthesis

GANimator: Neural Motion Synthesis from a Single Sequence

The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition

GAN-Avatar: Controllable Personalized GAN-based Human Head Avatar

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

One Shot, One Talk: Whole-body Talking Avatar from a Single Image

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections