Abstract:Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.

Automatic Emotion Recognition of Speech Signal in Mandarin

Automatic Emotion Recognition of S

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Emotional Speech Clustering Based Robust Speaker Recognition System

Deep Spectrum Feature Representations for Speech Emotion Recognition

Emotion-State conversion for speaker recognition

Speech Emotion Recognition Based on Feature Selection and Extreme Learning Machine Decision Tree

EmoEars: an emotion recognition system for mandarin speech

A Combined Cepstral Distance Method for Emotional Speech Recognition

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Speech Emotion Recognition Using Multiple Classifiers

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Emotion Recognition and Conversion for Mandarin Speech

Mandarin Emotion Recognition Combining Acoustic and Emotional Point Information

Affect-insensitive Speaker Recognition Systems Via Emotional Speech Clustering Using Prosodic Features

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Speech emotion recognition using combination of features

Speech emotion recognition using a novel feature set

Speech Emotion Recognition using Channel Attention Mechanism

Statistical Feature Selection for Mandarin Speech Emotion Recognition