Machine learning techniques for speech emotion recognition using paralinguistic acoustic features

Tulika Jha,Ramisetty Kavya,Jabez Christopher,Vasan Arunachalam
DOI: https://doi.org/10.1007/s10772-022-09985-6
2022-07-09
International Journal of Speech Technology
Abstract:Speech emotion recognition is one of the fastest growing areas of interest in the field of affective computing. Emotion detection aids human–computer interaction and finds application in a wide gamut of sectors, ranging from healthcare to retail to education. The present work strives to provide a speech emotion recognition framework that is both reliable and efficient enough to work in real-time environments. Speech emotion recognition can be performed using linguistic as well as paralinguistic aspects of speech; this work focusses on the latter, using non-lexical or paralinguistic attributes of speech like pitch, intensity and mel-frequency cepstral coefficients to train supervised machine learning models for emotion recognition. A combination of prosodic and spectral features is used for experimental analysis and classification is performed using algorithms like Gaussian Naïve Bayes, Random Forest, k -Nearest Neighbours, Support Vector Machine and Multilayer Perceptron. The choice of these ML models was based on the swiftness with which they could be trained, making them more suitable for real-time applications. Comparative analysis of the models reveals SVM and MLP to be the best performers with 77.86% and 79.62% accuracies respectively. The performance of these classifiers is compared with benchmark results in literature, and a significant improvement over state-of-the-art models is presented. The observations and findings of this work can be applied to design real-time emotion recognition frameworks that can be used to design and develop applications and technologies for various domains.
What problem does this paper attempt to address?