Abstract:Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has there-fore many applications, e.g., human-machine interface. In this paper, we propose an emotional tonal speech dataset, namely Mandarin Chinese Emotional Speech Dataset - Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art emotional speech datasets which are only focused on perceived emotions, the proposed MES-P dataset includes not only perceived emotions with their proximal labels but also intended emotions with distal labels, thereby making it possible to study human emotional intelligence, i.e. people emotion expression ability and their skill of understanding emotions, thus explicitly accounting for perception differences between intended and perceived emotions in speech signals and enabling studies of emotional misunderstandings which often occur in real life. Furthermore, the proposed MES-P dataset also captures a main feature of tonal languages, i.e., tonal variations, and provides recorded emotional speech samples whose tonal variations match the tonal distribution in real life Mandarin Chinese. Besides, the proposed MES-P dataset features emotion intensity variations as well, and includes both moderate and intense versions of recordings for joy, anger, and sadness in addition to neutral speech. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. The consistency between the speakers' emotional intentions and the listeners' perceptions is also studied using Cohen's Kappa coefficients. Finally, we also carry out extensive experiments using a baseline on MES-P for automatic emotion recognition and compare the results with human emotion intelligence.

An Emotional Text-Driven 3D Visual Pronunciation System for Mandarin Chinese

A Realistic 3d Articulatory Animation System for Emotional Visual Pronunciation

Text-driven Visual Prosody Generation for Embodied Conversational Agents

Visualization of Mandarin articulation by using a physiological articulatory model

A Study of Correlation Between Physiological Process of Articulation and Emotions on Mandarin Chinese.

Prosody Analysis And Modeling For Emotional Speech Synthesis

Emotional Audio-Visual Speech Synthesis Based on PAD

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese

Articulatory-Acoustic Analyses of Mandarin Words in Emotional Context Speech for Smart Campus

Articulatory and Acoustic Analyses of Mandarin Sentences with Different Emotions for Speaking Training of Dysphonic Disorders

Emotional Chinese talking head system

Speech Driven Facial Animation Using Chinese Mandarin Pronunciation Rules

Mapping Acoustic Characteristics of Emotional Prosody in Mandarin Disyllabic Words: A Machine-Learning Approach

Visualization of mandarin articulation driven by ultrasound data

3D Visible Speech Animation Driven by Chinese Prosody Markup Language

The Mandarin Chinese auditory emotions stimulus database: A validated set of Chinese pseudo-sentences

Emotional Speech Synthesis Based on PSOLA

MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

Emotional Voice Puppetry

Real-time Speech-Driven Animation of Expressive Talking Faces.