Deep Learning Techniques for Speech Emotion Recognition: A Review

Sandeep Kumar Pandey,H. S. Shekhawat,S. R. M. Prasanna
DOI: https://doi.org/10.1109/radioelek.2019.8733432
2019-04-01
Abstract:This paper presents an introduction to various deep learning techniques with the aim of capturing and classifying emotional state from speech utterances. Architectures such as Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM) have been used to test the emotion capturing capability from various standard speech represenations such as mel spectrogram, magnitude spectrogram and Mel-Frequency Cepstral Coefficients (MFCC’s) on two popular datasets- EMO-DB and IEMOCAP. Experimental findings along with reasoning have been presented as to which architecture and feature combination is better suited for the purpose of speech emotion recognition. This work explores the widely used basic deep learning architectures used in literature.
What problem does this paper attempt to address?