Abstract:Virtual reality is rapidly evolving into a pragmatically usable technology for mental health (MH) applications. As the underlying enabling technologies continue to evolve and allow us to design more useful and usable structural virtual environments (VEs), the next important challenge will involve populating these environments with virtual representations of humans (avatars). This will be vital to create mental health VEs that leverage the use of avatars for applications that require human-human interaction and communication. As Alessi et al.1 pointed out at the 8th Annual Medicine Meets Virtual Reality Conference (MMVR8), virtual humans have mainly appeared in MH applications to "serve the role of props, rather than humans." More believable avatars inhabiting VEs would open up possibilities for MH applications that address social interaction, communication, instruction, assessment, and rehabilitation issues. They could also serve to enhance realism that might in turn promote the experience of presence in VR. Additionally, it will soon be possible to use computer-generated avatars that serve to provide believable dynamic facial and bodily representations of individuals communicating from a distance in real time. This could support the delivery, in shared virtual environments, of more natural human interaction styles, similar to what is used in real life between people. These techniques could enhance communication and interaction by leveraging our natural sensing and perceiving capabilities and offer the potential to model human-computer-human interaction after human-human interaction. To enhance the authenticity of virtual human representations, advances in the rendering of facial and gestural behaviors that support implicit communication will be needed. In this regard, the current paper presents data from a study that compared human raters' judgments of emotional expression between actual video clips of facial expressions and identical expressions rendered on a three-dimensional avatar using a performance-driven facial animation (PDFA) system developed at the University of Southern California Integrated Media Systems Center. PDFA offers a means for creating high-fidelity visual representations of human faces and bodies. This effort explores the feasibility of sensing and reproducing a range of facial expressions with a PDFA system. In order to test concordance of human ratings of emotional expression between video and avatar facial delivery, we first had facial model subjects observe stimuli that were designed to elicit naturalistic facial expressions. The emotional stimulus induction involved presenting text-based, still image, and video clips to subjects that were previously rated to induce facial expressions for the six universals2 of facial expression (happy, sad, fear, anger, disgust, and surprise), in addition to attentiveness, puzzlement and frustration. Videotapes of these induced facial expressions that best represented prototypic examples of the above emotional states and three-dimensional avatar animations of the same facial expressions were randomly presented to 38 human raters. The raters used open-end, forced choice and seven-point Likert-type scales to rate expression in terms of identification. The forced choice and seven-point ratings provided the most usable data to determine video/animation concordance and these data are presented. To support a clear understanding of this data, a website has been set up that will allow readers to view the video and facial animation clips to illustrate the assets and limitations of these types of facial expression-rendering methods (www. USCAvatars.com/MMVR). This methodological first step in our research program has served to provide valuable human user-centered feedback to support the iterative design and development of facial avatar characteristics for expression of emotional communication.

Performance-driven facial animation: basic research on human judgments of emotional state in facial avatars

Evaluating the Feasibility of Emotion Expressions in Avatars Created From Real Person Photos: Pilot Web-Based Survey of Virtual Reality Software

Dynamic Facial Expression of Emotion Made Easy

Validation of dynamic virtual faces for facial affect recognition

Evaluating the Sensitivity to Virtual Characters Facial Asymmetry in Emotion Synthesis

EmoFace: Audio-driven Emotional 3D Face Animation

Can we truly transfer an actor's genuine happiness to avatars? An investigation into virtual, real, posed and spontaneous faces

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Facial-Video-Based Physiological Signal Measurement: Recent Advances and Affective Applications

A Comparative Study of Four 3D Facial Animation Methods: Skeleton, Blendshape, Audio-Driven, and Vision-Based Capture

A multidimensional measurement of photorealistic avatar quality of experience

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Automatic User-Specific Avatar Parametrisation and Emotion Mapping

Automatic Realtime User Performance-Driven Avatar Animation

ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication

AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely in VR

Universal Facial Encoding of Codec Avatars from VR Headsets

Can deepfakes be used to study emotion perception? A comparison of dynamic face stimuli

Synthetic vs Human Emotional Faces: What Changes in Humans' Decoding Accuracy

ECAvatar: 3D Avatar Facial Animation with Controllable Identity and Emotion

Real-time Conversion from a Single 2D Face Image to a 3D Text-Driven Emotive Audio-Visual Avatar