Abstract:Humans, as intricate beings driven by a multitude of emotions, possess a remarkable ability to decipher and respond to socio-affective cues. However, many individuals and machines struggle to interpret such nuanced signals, including variations in tone of voice. This paper explores the potential of intelligent technologies to bridge this gap and improve the quality of conversations. In particular, the authors propose a real-time processing method that captures and evaluates emotions in speech, utilizing a terminal device like the Raspberry Pi computer. Furthermore, the authors provide an overview of the current research landscape surrounding speech emotional recognition and delve into our methodology, which involves analyzing audio files from renowned emotional speech databases. To aid incomprehension, the authors present visualizations of these audio files in situ, employing dB-scaled Mel spectrograms generated through TensorFlow and Matplotlib. The authors use a support vector machine kernel and a Convolutional Neural Network with transfer learning to classify emotions. Notably, the classification accuracies achieved are 70% and 77%, respectively, demonstrating the efficacy of our approach when executed on an edge device rather than relying on a server. The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the speaker’s emotional state in less than one second on a Raspberry Pi. These findings pave the way for more effective and emotionally intelligent human-machine interactions in various domains.

Speech Interactive Emotion Recognition System Based on Random Forest

Speaker-independent Speech Emotion Recognition Based on Random Forest Feature Selection Algorithm

Emotional Speech Clustering Based Robust Speaker Recognition System

Speech Emotion Recognition Based on Linear Discriminant Analysis and Support Vector Machine Decision Tree

Speech Emotion Recognition Based on Feature Selection and Extreme Learning Machine Decision Tree

Silent Speech Recognition based on sEMG and EEG Signals

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction

Speech emotion recognition based on convolution neural network combined with random forest

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Feature selection of mime speech recognition using surface electromyography data

Speech Emotion Recognition Based on Formant Characteristics Feature Extraction and Phoneme Type Convergence.

Silent Speech Recognition Based on Surface Electromyography

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques

Two-stage classification of emotional speech

Deep Learning and SVM-based Emotion Recognition from Chinese Speech for Smart Affective Services

Enhancing Human-Machine Interaction: Real-Time Emotion Recognition through Speech Analysis

Biologically inspired speech emotion recognition

A Study of Speech Emotion Recognition Based on Hybrid Algorithm