Abstract:Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has there-fore many applications, e.g., human-machine interface. In this paper, we propose an emotional tonal speech dataset, namely Mandarin Chinese Emotional Speech Dataset - Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art emotional speech datasets which are only focused on perceived emotions, the proposed MES-P dataset includes not only perceived emotions with their proximal labels but also intended emotions with distal labels, thereby making it possible to study human emotional intelligence, i.e. people emotion expression ability and their skill of understanding emotions, thus explicitly accounting for perception differences between intended and perceived emotions in speech signals and enabling studies of emotional misunderstandings which often occur in real life. Furthermore, the proposed MES-P dataset also captures a main feature of tonal languages, i.e., tonal variations, and provides recorded emotional speech samples whose tonal variations match the tonal distribution in real life Mandarin Chinese. Besides, the proposed MES-P dataset features emotion intensity variations as well, and includes both moderate and intense versions of recordings for joy, anger, and sadness in addition to neutral speech. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. The consistency between the speakers' emotional intentions and the listeners' perceptions is also studied using Cohen's Kappa coefficients. Finally, we also carry out extensive experiments using a baseline on MES-P for automatic emotion recognition and compare the results with human emotion intelligence.

Learning to Infer Public Emotions from Large-Scale Networked Voice Data

Emotion Detection in Online Social Networks: A Multilabel Learning Approach

Inferring Emotions from Large-Scale Internet Voice Data.

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Acoustics, Content and Geo-Information Based Sentiment Prediction from Large-Scale Networked Voice Data

Self-attention Transfer Networks for Speech Emotion Recognition

Emotion Inferring from Large-scale Internet Voice Data: A Multimodal Deep Learning Approach

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation Based Deep Learning Approach

Inferring Users' Emotions For Human-Mobile Voice Dialogue Applications

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach.

A New Network Structure for Speech Emotion Recognition Research

Emotion recognition and affective computing on vocal social media

Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network

Mining Emotions of the Public from Social Media for Enhancing Corporate Credit Rating

Emotion recognition of social media users based on deep learning

MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

Manifolds Based Emotion Recognition in Speech.

Inferring Emotions from Social Images Leveraging Influence Analysis

An Analysis of Emotional Tendency Under the Network Public Opinion: Deep Learning

Research on Emotional Interaction and User Cognition