Abstract:The challenge of identifying the emotional qualities of voice, regardless of the semantic meaning, is known as speech emotion recognition (SER). While people are capable of performing this activity efficiently as a natural aspect of voice communication, the capacity to do so autonomously through programmed technologies is indeed a work in progress. As it offers perspective on human mental processes, emotion identification from speech signals is a frequently investigated topic in the construction of human–computer interface (HCI) models. In HCI, it is frequently necessary to determine the emotion of persons as mental feedback. An attempt is made in this study to distinguish seven different emotions using speech signals: sadness, anger, disgusted, pleased, surprised, enjoyable, and neutrality mood. For the identification of emotion, the suggested method uses a signals preprocessing method based on the randomness measure. The signals are first normalized to reduce noise. Due to the obvious changing length and continual form of voice signals, emotions identification requires both locally and globally information. Local features depict dynamic behavior, while feature points reveal statistic factors such as standard error, median, and lowest and maximum values. The SER system includes several features, including spectrum characteristics, sound quality characteristics, and Teager energy operator-based characteristics. Prosodic features are those that are based on the human perception, such as rhythm and inflection. These characteristics are based on three factors: power, length, and frequency response. From of the heavily processed signals, a features vector is generated that evaluates the random feature for all of the emotional responses. Then, using mutual information (MI), the feature vector is utilized to choose from the entire set. The feature vectors are then categorized using the BOAT method and association rule mining. Experiments were carried out on the TESS dataset for several metrics, and the performance of the suggested method outperformed the state-of-the-art methods.

Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks

Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention

Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition

Speech Emotion Recognition Based on Convolutional Neural Network with Attention-Based Bidirectional Long Short-Term Memory Network and Multi-Task Learning

EmoDiarize: Speaker Diarization and Emotion Identification from Speech Signals using Convolutional Neural Networks

Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion

Improved Speech Emotion Classification Using Deep Neural Network

Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer

Speaker-Independent Speech Emotion Recognition Based On Cnn-Blstm And Multiple Svms

Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network

Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features

Speech Emotion Recognition Using Mel-Frequency Cepstral Coefficients & Convolutional Neural Networks

A Comparative Analysis of Different Approach for Basic Emotions Recognition from Speech

Effective MLP and CNN based ensemble learning for speech emotion recognition

Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare

Machine learning technique-based emotion classification using speech signals