Abstract:The paper presents an emotional speech recognition system with the analysis of manifolds of speech. Working with large volumes of high-dimensio nal acoustic features, the researchers confront the problem of dimensionality reduction. Unlike classical techniques, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), a new approach, named Enhanced Lipschitz Embedding (ELE) is proposed in the paper to discover the nonlinear degrees of freedom that underlie the emotional speech corpus. ELE adopts geodesic distance to preserve the intrinsic geometry at all scales of speech corpus. Based on geodesic distance estimation, ELE embeds the 64-dimensional acoustic features into a six-dimensional space in which speech data with the same emotional state are generally clustered around one plane and the data distribution feature is beneficial to emotion classification. The compressed testing data is classified into six emotional states (neutral, anger, fear, happiness, sadness and surprise) by a trained linear Support Vector Machine (SVM) system. Considering the perception constancy of humans, ELE is also investigated in terms of its ability to detect the intrinsic geometry of emotional speech corrupted by noise. The performance of the new approach is compared with the methods of feature selection by Sequential Forward Selection (SFS), PCA, LDA, Isomap and Locally Linear Embedding (LLE). Experimental results demonstrate that, compared with other methods, the proposed system gives 9%-26% relative improvement in speaker-indep endent emotion recognition and 5%-20% improvement in speaker-dependent recognition. Meanwhile, the proposed system shows robustness and an improvement of approximately 10% in emotion recognition accuracy when speech is corrupted by increasing noise.

Group Sparse Features for Speech Emotion Perception in Tensor Space

Deep Spectrum Feature Representations for Speech Emotion Recognition

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

Speech Emotion Recognition Based on Feature Selection and Extreme Learning Machine Decision Tree

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech Emotion Recognition Based on Formant Characteristics Feature Extraction and Phoneme Type Convergence.

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM

A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition

Manifolds Based Emotion Recognition in Speech.

Feature selection for fast speech emotion recognition.

Feature selection enhancement and feature space visualization for speech-based emotion recognition

Multi-View Common Space Learning For Emotion Recognition In The Wild

Ms-senet: Enhancing Speech Emotion Recognition Through Multi-scale Feature Fusion With Squeeze-and-excitation Blocks

Emotion Recognition and EEG Analysis Using ADMM-Based Sparse Group Lasso

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.

Facial Expression Recognition Via Weighted Group Sparsity.

Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion