Abstract:One of the main challenges facing the current approaches of speech emotion recognition is the lack of a dataset large enough to train the currently available deep learning models properly. Therefore, this paper proposes a new data augmentation algorithm to enrich the speech emotions dataset with more sam Department, College of Computing and ples through a careful addition of noise fractions. In addition, the hyperparameters of the currently available deep learning models are either handcrafted or adjusted during the training process. However, this approach does not guarantee finding the best settings for these parameters. Therefore, we propose an optimized deep learning model in which the hyperparameters are optimized to find their best settings and thus achieve more recognition results. This deep learning model consists of a convolutional neural network (CNN) composed of four local feature-learning blocks and a long short-term memory (LSTM) layer for learning local and long-term correlations in the log Mel-spectrogram of the input speech samples. To improve the performance of this deep network, the learning rate and label smoothing regularization factor are optimized using the recently emerged stochastic fractal search (SFS)-guided whale optimization algorithm (WOA). The strength of this algorithm is the ability to balance between the exploration and exploitation of the search agents' positions to guarantee to reach the optimal global solution. To prove the effectiveness of the proposed approach, four speech emotion datasets, namely, IEMOCAP, Emo-DB, RAVDESS, and SAVEE, are incorporated in the conducted experiments. Experimental results confirmed the superiority of the proposed approach when compared with state-of-the-art approaches. Based on the four datasets, the achieved recognition accuracies are 98.13%, 99.76%, 99.47%, and 99.50%, respectively. Moreover, a statistic- l analysis of the achieved results is provided to emphasize the stability of the proposed approach.

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition

Efficient Arabic emotion recognition using deep neural networks

Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT

Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention

Improved Speech Emotion Classification Using Deep Neural Network

Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm

The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets

Deep learning for emotion analysis in Arabic tweets

Wav2vec2. 0 and Context Emotional Information Compensation Based Dialogue Speech Emotion Recognition

Speech emotion recognition using feature fusion: a hybrid approach to deep learning

Multimodal Emotional Classification Based on Meaningful Learning

Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks

Empathetic BERT2BERT Conversational Model: Learning Arabic Language Generation with Little Data

Impact of Using Bidirectional Encoder Representations from Transformers (BERT) Models for Arabic Dialogue Acts Identification

Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier

Emotion analysis of Arabic tweets using deep learning approach

Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

Speech emotion recognition with deep convolutional neural networks

Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms

Emotion Recognition Using Speaker Cues