Speech Emotion Recognition Using Deep Learning
Dr.G. Prathibha,,Y Kavya,P.Vinay Jacob,L Poojita
DOI: https://doi.org/10.55041/ijsrem36262
2024-07-04
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
Abstract:Speech is one of the primary forms of expression and is important for Emotion Recognition. Emotion Recognition is helpful to derive various useful insights about the thoughts of a person. Automatic speech emotion recognition is an active field of study in Artificial intelligence and Machine learning, which aims to generate machines that communicate with people via speech. In this work, deep learning algorithms such as Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are explored to extract features and classify emotions such as calm, happy, fearful, disgust, angry, neutral, surprised and sad using the Toronto emotional speech set (TESS) dataset which consists of 2800 files. The features like Mel-frequency cepstral coefficients(MFCC), chroma and mel spectrogram are extracted from speech using the pre-trained networks such as Xception, VGG16, Resnet50, MobileNetV2, DenseNet121, NASNetLarge, EfficientNetB5, EfficientNetV2M, InceptionV3, ConvNeXtTiny, EfficientNetV2B2, EfficientNetB6, ResNet152V2. Features of the two different networks are fused using the fusion techniques such as Early, Mid, Late to get better optimum results. Features are then classified initially with the Long Short Term Memory (LSTM) finally resulted in the accuracy of 99%. In this paper the work is extended to RAVDESS dataset also which consists of seven emotions such as calm, joyful, sad, surprised, afraid, disgust and angry in total of 1440 files. Keywords: Convolution Neural Network, Recurrent Neural Network, speech emotion recognition, MFCC, Chroma, Mel, LSTM.