Modified dense convolutional networks based emotion detection from speech using its paralinguistic features

Ritika Dhiman,Gurkanwal Singh Kang,Varun Gupta
DOI: https://doi.org/10.1007/s11042-021-11210-6
IF: 2.577
2021-07-22
Multimedia Tools and Applications
Abstract:Emotion recognition through speech is one of the fundamental approaches for human interaction. Speech modulations stipulate different emotions and context. In this paper, we propose modified dense convolutional networks (modified DenseNet201) for emotion detection from speech using its paralinguistic features such as vocal tract features. The proposed network performs emotion classification from speech using spectrograms of its audio files. The proposed network outperforms other alternative models like residual networks, AlexNet, VGG16, SVM, XGBoost, boosted random forest etc. for emotion classification from speech. Moreover, the proposed network surpasses all other existing methods proposed in the literature and obtains state-of-the-art results in most of the cases. Further, the proposed network has been successfully validated on two different language datasets: 'EmoDB' and 'SAVEE' which qualifies it as a language-independent emotion detection system from speech.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?