Speech emotion recognition using feature fusion: a hybrid approach to deep learning
Waleed Akram Khan,Hamad ul Qudous,Asma Ahmad Farhan
DOI: https://doi.org/10.1007/s11042-024-18316-7
IF: 2.577
2024-02-20
Multimedia Tools and Applications
Abstract:Speech emotion recognition holds significant importance as it enables machines to understand and respond to human emotions, enhancing human-computer interaction and personalized experiences. Accurate identification and interpretation of emotional states from speech signals enable various benefits, including enhanced personalized experiences, effective monitoring of mental health, and improved human-computer interfaces. However, recognizing emotions from speech is a difficult task primarily because there exists a significant disparity between acoustic features and human emotions. Both vocal cues and spoken words play significant roles in determining a person's emotional state. Therefore, in order to accurately identify human emotions from speech, it is essential to extract distinct and meaningful acoustic features. In this paper, we propose a novel approach to infer human emotional states. Human emotional state recognition has a wide range of applications ranging from customer service to mental health. Our proposed approach extracts a set of features from the speech signals, and employs a framework known as deep stride convolutional neural network using bi-directional LSTM. Our proposed model achieved a high accuracy of 95 % which is almost 20 % higher when compared to the state of the art on the RAVDESS dataset, while also minimizing the loss.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering